
preg_match_all() - php 正则表达式(PCRE)

乐乐1年前 (2023-11-21)阅读数 23#技术干货


(PHP 4, PHP 5, PHP 7)



preg_match_all(string $pattern,string $subject[,array &$matches[,int $flags= PREG_PATTERN_ORDER[,int $offset= 0]]]): int














example: , 
this is a test
example: , this is a test





    [0] => 
    [1] => bar



example: , example:
this is a test
, this is a test



    [0] => Array
            [0] => Array
                    [0] => foobarbaz
                    [1] => 0
    [1] => Array
            [0] => Array
                    [0] => foo
                    [1] => 0
    [2] => Array
            [0] => Array
                    [0] => bar
                    [1] => 3
    [3] => Array
            [0] => Array
                    [0] => baz
                    [1] => 6








matched: bold textpart 1: part 2: b
part 3: bold text
part 4: matched: click me
part 1: 
part 2: a
part 3: click me
part 4: 


preg_match_all() - php 正则表达式(PCRE)


    [0] => Array
            [0] => a: 1
            [1] => b: 2
            [2] => c: 3
    [name] => Array
            [0] => a
            [1] => b
            [2] => c
    [1] => Array
            [0] => a
            [1] => b
            [2] => c
    [digit] => Array
            [0] => 1
            [1] => 2
            [2] => 3
    [2] => Array
            [0] => 1
            [1] => 2
            [2] => 3


  • PCRE 匹配
  • preg_quote()转义正则表达式字符
  • preg_match()执行匹配正则表达式
  • preg_replace()执行一个正则表达式的搜索和替换
  • preg_split()通过一个正则表达式分隔字符串
  • preg_last_error()返回最后一个PCRE正则执行产生的错误代码
if you want to extract all {token}s from a string:

  [0] => Array
      [0] => {token1}
      [1] => {token2}
The code that john at mccarthy dot net posted is not necessary. If you want your results grouped by individual match simply use:

Be careful with this pattern match and large input buffer on preg_match_* functions.

if $buffer is 80+ KB in size, you'll end up with segfault! 
[89396.588854] php[4384]: segfault at 7ffd6e2bdeb0 ip 00007fa20c8d67ed sp 00007ffd6e2bde70 error 6 in libpcre.so.3.13.1[7fa20c8c3000+3c000]
This is due to the PCRE recursion. This is a known bug in PHP since 2008, but it's source is not PHP itself but PCRE library. 
Rasmus Lerdorf has the answer: https://bugs.php.net/bug.php?id=45735#1365812629
"The problem here is that there is no way to detect run-away regular expressions 
here without huge performance and memory penalties. Yes, we could build PCRE in a 
way that it wouldn't segfault and we could crank up the default backtrack limit 
to something huge, but it would slow every regex call down by a lot. If PCRE 
provided a way to handle this in a more graceful manner without the performance 
hit we would of course use it."
I needed a function to rotate the results of a preg_match_all query, and made this. Not sure if it exists.

Example - Take results of some preg_match_all query:
  [0] => Array
      [1] => Banff 
      [2] => Canmore
      [3] => Invermere
  [1] => Array
      [1] => AB 
      [2] => AB
      [3] => BC
  [2] => Array
      [1] => 51.1746254 
      [2] => 51.0938416
      [3] => 50.5065193
  [3] => Array
      [1] => -115.5719757 
      [2] => -115.3517761
      [3] => -116.0321884
  [4] => Array
      [1] => T1L 1B3 
      [2] => T1W 1N2
      [3] => V0B 2G0
Rotate it 90 degrees to group results as records:
  [0] => Array
      [1] => Banff 
      [2] => AB
      [3] => 51.1746254
      [4] => -115.5719757
      [5] => T1L 1B3
  [1] => Array
      [1] => Canmore
      [2] => AB
      [3] => 51.0938416
      [4] => -115.3517761
      [5] => T1W 1N2
  [2] => Array
      [1] => Invermere
      [2] => BC
      [3] => 50.5065193
      [4] => -116.0321884
      [5] => V0B 2G0
Here is a awesome online regex editor https://regex101.com/
which helps you test your regular expressions (prce, js, python) with real-time highlighting of regex match on data input.
Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements! 
For parsing queries with entities use: 
Perhaps you want to find the positions of all anchor tags. This will return a two dimensional array of which the starting and ending positions will be returned. 
To count str_length in UTF-8 string i use
$count = preg_match_all("/[[:print:]\pL]/u", $str, $pockets);
[:print:] - printing characters, including space
\pL - UTF-8 Letter
/u - UTF-8 string
other unicode character properties on http://www.pcre.org/pcre.txt
Here is a way to match everything on the page, performing an action for each match as you go. I had used this idiom in other languages, where its use is customary, but in PHP it seems to be not quite as common.

Note that the offsets returned are byte values (not necessarily number of characters) so you'll have to make sure the data is single-byte encoded. (Or have a look at paolo mosna's strByte function on the strlen manual page).
I'd be interested to know how this method performs speedwise against using preg_match_all and then recursing through the results.
i have made up a simple function to extract a number from a string..
I am not sure how good it is, but it works.
It gets only the numbers 0-9, the "-", " ", "(", ")", "."
characters.. This is as far as I know the most widely used characters for a Phone number. 
please note, that the function of "mail at SPAMBUSTER at milianw dot de" can result in invalid xhtml in some cases. think i used it in the right way but my result is sth like this:
foo foo foo foo 
correct me if i'm wrong. 
i'll see when there's time to fix that. -.-
If you'd like to include DOUBLE QUOTES on a regular expression for use with preg_match_all, try ESCAPING THRICE, as in: \\\"
For example, the pattern:
[\s\w\/=\\\"]*/' Should be able to match:
a b
.. with all there is under those table tags. I'm not really sure why this is so, but I tried just the double quote and one or even two escape characters and it won't work. In my frustration I added another one and then it's cool.
when regex is for longer and shorter version of a string,
only one of that long and short versions is catched.
when regex match occurs in one position of string,
only one match is saved in matches[0] for that position.
if ? is used, regex is greedy, and catches more long version,
if | is used, most first matching variant is catched:

['ab', 'abc'] in $m[0] for both can be expected, but it is not so,
actually they output [['ab']] and [['abc']]:
array(1) {
 array(1) {
  string(2) "ab"
array(1) {
 array(1) {
  string(3) "abc"
I had been crafting and testing some regexp patterns online using the tools Regex101 and a `preg_match_all()` tester and found that the regexp patterns I wrote worked fine on them, just not in my code.
My problem was not double-escaping backslash characters:

鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com

免责声明:我们致力于保护作者版权,注重分享,当前被刊用文章因无法核实真实出处,未能及时与作者取得联系,或有版权异议的,请联系管理员,我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理!邮箱:344225443@qq.com)

