Preparing HTML And PHP Code For Pubilishing On Websites

I talked a while ago about Adding Code To Wordpress Blogs And Comments, but I decided that it needed a bit of code to do this automatically.

So here it is, prepared by the text processor.

<form method="post" action="http://www.hashbangcode.com/examples/text-process/text.php">
    <textarea name="text" rows="10" cols="80" wrap="off"></textarea>
    <input type="submit" value="Process" />
</form>
 
<?php
if ( isset($_POST["text"]) ) {
    $text   = $_POST["text"];
    $text   = stripslashes( $text );
    $input  = array ( "/&/", "/'/", "/"/", "/</", "/>/", "/t/", "/(?<=s)x20|x20(?=s)/", "/^\s$/m", "/&/", "/rn/" );
    $output = array ( "&amp;", "&#39;", "&quot;", "&lt;", "&gt;", "&nbsp;&nbsp;&nbsp;&nbsp;", "&nbsp;", "&nbsp;<br />", "&amp;", "<br />" );
    $temp = preg_replace($input, $output, $text);
    echo '<div style="border:1px solid grey;">'.$temp.'</div>';
}
?>

There seems to be rather a lot going on here, but the process is quite simple. The preg_replace() function can take an array as an argument for the input and output parameters. When you do this the arrays will be matched up so that the second item in the input array will be replaced by the second item in the output array.

So here is a list of the things I am matching for and what they are replaced with.

  • /&/ This matches for any ampersand, we replace this with the encoded variant of &amp;.
  • /'/ Find single quotes and encode them with &#39;.
  • /\"/ Find double quotes and encode them with &quot;.
  • /</ This matches all < and replaces them with &lt;.
  • />/ Same as above but the other way around, in this case the equivalent is &gt;.
  • /\t/ Next we start matching for white space, the first is to find all tab characters and replace them with four &nbsp; characters, like this &nbsp;&nbsp;&nbsp;&nbsp;
  • /(?<=\s)\x20|\x20(?=\s)/ Next we look for any space character that has white space characters before and after it and replace with a single white space character &nbsp;.
  • /^\s$/m This matches for any line with nothing on it. These must be replaced with a single &nbsp; character, but in order to keep the code as it was posted we add a <br /> tag, the final output would be &nbsp;<br />.
  • /&/ Now that we have all of our tags encoded we need to re encode all of the & characters so that when the script prints out the content to a HTML page with all & translated to &amp;.
  • /\r\n/ Finally, we find all of the new line characters and convert them to <br /> tags. You might want to change this to just \n if you are using a Linux format.

Before we do any of this we pass the text through the stripslashes() function. This is because sending the text over POST might add slashes to the " and ' characters. This call just removes them.

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
18 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.