Using The e Modifier In PHP preg_replace

The PHP function preg_replace() has powerful functionality in its own right, but extra depth can be added with the inclusion of the e modifier. Take the following bit of code, which just picks out the letters of a string and replaces them with the letter X.

$something = 'df1gdf2gdf3sgdfg';
$something = preg_replace("/([a-z]*)/", "X", $something);
echo $something; // prints XX1XX2XX3XX

This is simple enough, but using the e modifier allows us to use PHP functions within the replace parameters. The following bit of code turns all letters upper case in a string of random letters by using the strtoupper() PHP function.

$something = 'df1gdf2gdf3sgdfg';
$something = preg_replace("/([a-z]*)/e", "strtoupper('\\1')", $something);
echo $something; // prints DF1GDF2GDF3SGDFG

Here is another example, but in this case the full string is repeated after the modified string.

$something = 'df1gdf2gdf3sgdfg';
$something = preg_replace("/([a-z0-9]*)/e", "strtoupper('\\1').'\\1'", $something);
echo $something; // prints DF1GDF2GDF3SGDFGdf1gdf2gdf3sgdfg

Notice that when using the e modifier it is important to properly escape the string with single and double quotes. This is because the string as a whole is parsed as PHP and so if you don't put single quotes around the backreferences then you will get PHP complaining about constants.

For a more complex example I modified the createTextLinks() function that wrote about recently on the site. The function originally found any URL strings within a larger string and turned them into links. The modified function now returns the same thing, except that the link text has been shortened using the shortenurl() function.

$longurl = "there is the new site http://www.google.co.uk/search?aq=f&num=100&hl=en&client=firefox-a&channel=s&rls=org.mozilla%3Aen-US%3Aofficial";
 
function createShortTextLinks($str='') {
 
 if($str=='' or !preg_match('/(http|www\.|@)/im', $str)){
  return $str;
 }
 
 // replace links:
 $str = preg_replace("/([ \t]|^)www\./im", "\\1http://www.", $str);
 $str = preg_replace("/([ \t]|^)ftp\./im", "\\1ftp://ftp.", $str);
 
 $str = preg_replace("/(https?:\/\/[^ )\r\n!]+)/eim", "'<a href=\"\\1\" title=\"\\1\">'.shortenurl('\\1').'</a>'", $str);
 
 $str = preg_replace("/(ftp:\/\/[^ )\r\n!]+)/eim", "'<a href=\"\\1\" title=\"\\1\">'.shortenurl('\\1').'</a>'", $str);
 
 $str = preg_replace("/([-a-z0-9_]+(\.[_a-z0-9-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)+))/eim", "'<a href=\"mailto:\\1\" title=\"Email \\1\">'.shortenurl('\\1').'</a>'", $str);
 
 $str = preg_replace("/(\&)/im","\\1amp;", $str);
 
 return $str;
}
 
function shortenurl($url){
 if(strlen($url) > 45){
  return substr($url, 0, 30)."[...]".substr($url, -15);
 }else{
  return $url;
 }
}
 
echo createShortTextLinks($longurl);

 

Comments

Any chance someone can tell how to shorten the URL if the link is already inserted? So if I have a text like "some text <a href="http://VERY_LONG_URL">http://VERY_LONG_URL</a>" some more text" and I want to shorten the URL only in the visible part to something like: "some text <a href="http://VERY_LONG_URL">http://SHORT_URL</a>". I just cannot get my brain to understand these regular expressions good enough to do this - please help! Chris
Permalink
You want to replace the regular expression so that it matches any string that looks like a URL and is in between a < and a >. This ought to work: (?:>)(https?:\/\/[^ )<\r\n!]+)(?:<) This can be plugged into the e modifier like this: $str = preg_replace("/(?:>)(https?:\/\/[^ )<\r\n!]+)(?:<)/eim", "'.shortenurl('\\1').'", $str); Let me know how you get on!
Name
Philip Norton
Permalink
if I change your suggestion to: $str = preg_replace("/(?:>)(https?:\/\/[^ )<\r\n!]+)(?:<)/","'.shortenurl('\\1').'", $str); it seems to work - but I guess the "" should not be taken out by the regexp and then added manually again .. Chris
Permalink
Cool, not bad for a quick guess. Wordpress messed around with your input, I think I corrected it, but let me know if I got it wrong.
Name
Philip Norton
Permalink

Actually not quite - I had to add a greater than before the shortenurl and a less than behind - cause the regexp did take them out from the a href and end a tags ... I just have one issue with the above statement - I found that if I have something like http://this.is.a.very.long.url the script will shorten the URL which results in a loss of the original information since there is no link on it that stays untouched. So would you be able to modify the statement that it only matches something like:

<a href="SOME_URL">SOME_URL</a>

To make sure I only catch links with the URL as the text and not just URLs that don't have a link associated with them? Thanks so much Chris

Permalink
hmm Wordpress again messed up my text - so one more try: when I have something like "less than" "p" "greater than" URL "less than" "p" "greater than" - the statement will shorten the URL resulting in a loss of information since there is no a href at all. I would like to only match something like "less than" "a href=" URL SOME_ADDITIONAL_PARAMETERS "greater than" URL "less than" "a" "greater than" would that be possible in order to make sure I only shorten URLs that actually have the same URL as link associated with it? Chris
Permalink
Despite the best efforts of Wordpress I get what you mean. You just need to add a rule at the start of the code to spot an opening a tag, no matter what it contains, at the start of the pattern. Try giving this a go.
(?:<[^\\]a.*?>)(https?:\/\/[^ )<\r\n!]+)(?:<)
If you are interested I use a tool called rework to test my regular expressions. Take a look - http://osteele.com/tools/rework/. In my experience, the easy part of writing regular expressions is matching things what you want, the difficult part is stopping it matching things you don't want.
Name
Philip Norton
Permalink
Hmm - just tried your last version on the page you posted - I always get no match. I should really try to dig in these regular expressions to understand why it is not working ... Chris
Permalink
Try removing that bit at the start, and using //2 instead of //1. The ?: means "match this, but don't do anything with it", and can lead to some problems on some systems due to lack of support. Like this...
(<a.*?>)(https?:\/\/[^ )<\r\n!]+)(<)
If you want to learn regular expressions quickly I can recommend getting Ben Forta's book Regular Expressions In 10 Minutes - ISBN 0672325667. I read that book and it all became clear, and it isn't as heavy going as some other books. I now use regular expressions every day and they don't scare me as much!
Name
Philip Norton
Permalink
Wow - that looks good - I am about there - I now used the following PHP command: $str = preg_replace("/(<a>)(https?:\/\/[^ )<\r\n!]+)()/eim", "'\\1'.shortenurl('\\2').'\\3'", $str); and apart from some escape "\" in front of every " in the output - the result is perfect. Thanks so much for all your help Chris
Permalink
i have a string which have lots of links/urls. i want to replace all links/urls except some links . I dont know how to do it.. Please any body can help me asap... Thank You in advance.
Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
5 + 3 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.