Revisiting filter_var() and FILTER_VALIDATE_URL

Quite a while ago I looked at using the filter_var() function to validate URL's using the FILTER_VALIDATE_URL flag and someone pointed out recently that this function has not only changed since the initial release, but that a number of flags can be added to change the way that this function works. Here are the flags available.

  • FILTER_FLAG_SCHEME_REQUIRED Require the scheme (eg, http://, ftp:// etc) within the URL.
  • FILTER_FLAG_HOST_REQUIRED Require host of the URL (eg, www.google.com)
  • FILTER_FLAG_PATH_REQUIRED Require a path after the host of the URL.
  • FILTER_FLAG_QUERY_REQUIRED Require a query at the end of the URL (eg, ?key=value)

These flags can be added to the normal filter_var() call to change the outcome of the result. Taking the FILTER_FLAG_PATH_REQUIRED flag as an example.

filter_var('http://www.bbc.co.uk', FILTER_VALIDATE_URL); // returns http://www.bbc.co.uk (which means it validates)
filter_var('http://www.bbc.co.uk', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED); // returns false

It is also possible to use multiple flags by adding more than one flag together. For example, if you want to require both the path and a query in the URL then you would use the following snippet.

filter_var('http://www.bbc.co.uk', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED); // returns false

So I thought I would try and test these URL filters as fully as I could and print the outcome in a truth table. To this end I created a bunch of URLs that I could test. Each item in the array contains an empty array that the results of each test will be placed into.

$urls = array(
    'http://www.bbc.co.uk' => array(),
    'http://www.hashbangcode.com' => array(),
    'http://www.hashbangcode.com/blog' => array(),
    'http://www.example.com/index.html#anchor' => array(),
    'http://www.example.com/index.html?q=123' => array(),    
    'example.com' => array(),
    'www.example.com' => array(),
    'www.example.com/blog' => array(),
    'www.example.com/index.html?q=123' => array(),    
    '/index.html?q=123' => array(),     
    'https://www.example.com/' => array(),
    'https://localhost' => array(),    
    'https://localhost/' => array(),
    'https://127.0.0.1/' => array(),    
    'http://.com' => array(),
    'http://...' => array(),
    'http://' => array(),
    'http://i\'me really trying to break this url!!!"£$"%$&*()' => array()
);

I then created a list of validation flags, including a NULL value for testing no flag and the different permutations of the four flags. You might ask yourself why I used both a string and the value in this array. The reason is that the flags are actually just integer constants, so when I tried to print them out later on I found that the integer value was printed instead of the actual flag key. So, to compromise I used a string representation of the flag as the array key and the actual flag as the value.

$flags = array(
'Null' => NULL,
'FILTER_FLAG_SCHEME_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED,
'FILTER_FLAG_HOST_REQUIRED' => FILTER_FLAG_HOST_REQUIRED,
'FILTER_FLAG_PATH_REQUIRED' => FILTER_FLAG_PATH_REQUIRED,
'FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_QUERY_REQUIRED,
'FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED,
'FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED,
'FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_QUERY_REQUIRED,
'FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED' => FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED,
'FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_QUERY_REQUIRED,
'FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED,
'FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED,
'FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED,
'FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED,
'FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED' => FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED | FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED,
);

These arrays were then pulled together and tested in the following way.

// Do Filtering
foreach ($urls as $url => $data) {
    foreach ($flags as $textFlag => $flag) {
        $urls[$url][$textFlag] = (filter_var($url, FILTER_VALIDATE_URL, $flag) === FALSE)? 'FALSE' : 'TRUE';
    }
}

The results of these tests were then printed in two different tables, the first using the flag as the header and the second using the URL as the header. Here is the code that printed out the tables.

// Print table with URLs as rows
$header = '<tr><th>URL</th><th>';
$header .= implode('</th><th>' , array_keys($flags)) . '</th>';
$header .= '</tr>';
$rows = '';

foreach ($urls as $url => $data) {
    $rows .= '<tr><td>' . $url . '</td>';
    foreach ($flags as $textFlag => $flag) {
        $colour = ($urls[$url][$textFlag] == 'TRUE')? 'green':'red';
        $rows .= '<td style="background-color:' . $colour . ';">' . $urls[$url][$textFlag] . '</td>';
    }
    $rows .= '</tr>';
}

echo '<table border="1">' . $header . $rows . '</table>';

echo '<br />';

// Print table with flags as rows
$header = '<tr><th>FLAG</th><th>';
$header .= implode('</th><th>' , array_keys($urls)) . '</th>';
$header .= '</tr>';
$rows = '';

foreach ($flags as $textFlag => $flag) {
    $rows .= '<tr><td>' . $textFlag . '</td>';
    foreach ($urls as $url => $data) {
        $colour = ($urls[$url][$textFlag] == 'TRUE')? 'green':'red';
        $rows .= '<td style="background-color:' . $colour . ';">' . $urls[$url][$textFlag] . '</td>';
    }
    $rows .= '</tr>';
}
echo '<table border="1">' . $header . $rows . '</table>';

All this code was run using the latest version of PHP at the time of writing this article, which was 5.3.3. This is a lot of data to cram into this page so I won't post it all here.

One thing I did find was that the FILTER_FLAG_SCHEME_REQUIRED flag doesn't appear to do anything. I think it is there to force something like http:// to appear at the front of the URL, but since the default action of the filter_var() function doesn't allow for the scheme to be missed out then this flag is basically meaningless. Things have progressed a lot since I last looked at this validation function, but it still isn't working as expected.

Comments

The FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED flags are left over from an earlier form of the code.

See bug 39898 and the source code change, from back in 2006.

Note that those flats are deliberately missing from the relevant sections of the PHP manual.

Permalink

@Peter - Thanks for those links, I appreciate the clarification.

I got the list of constants from the list here:
http://www.php.net/manual/en/filter.constants.php

Which would therefore appear to be either out of date or perhaps it is just an inadequate format for dispalying constant information?

I did some random clicking about in the mannual pages and managed to find this list, which appears to be more up to date...
http://www.php.net/manual/en/filter.filters.flags.php

Name
Philip Norton
Permalink

Philip, you're quite right in that the flags were still mentioned in a few places.  I have changed the documentation sources to remove them entirely (that I could find) and these changes should be reflected online within the week.

Permalink
thank you
Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
2 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.