Work Out Size In Bytes Of A PHP String

I found this very handy function on the php.net site in the user comments for the strlen() function. It accepts a string in ASCII or UTF-8 format and finds out how long that string is in bytes.

The function works by going through the string and adding how many bytes each character represents. For normal ASCII values this is a single byte so 1 is added to the total. Unicode characters can be up to 6 bytes and so the rest of this function works out how many bytes the character takes up by using AND calculations.

/**
* Count the number of bytes of a given string.
* Input string is expected to be ASCII or UTF-8 encoded.
* Warning: the function doesn't return the number of chars
* in the string, but the number of bytes.
* See http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
* for information on UTF-8.
*
* @param string $str The string to compute number of bytes
*
* @return The length in bytes of the given string.
*/
function strBytes($str){
 // STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT
 
 // Number of characters in string
 $strlen_var = strlen($str);
 
 // string bytes counter
 $d = 0;
 
 /*
 * Iterate over every character in the string,
 * escaping with a slash or encoding to UTF-8 where necessary
 */
 for($c = 0; $c < $strlen_var; ++$c){
  $ord_var_c = ord($str{$c});
  switch(true){
  case(($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)):
   // characters U-00000000 - U-0000007F (same as ASCII)
   $d++;
   break;
  case(($ord_var_c & 0xE0) == 0xC0):
   // characters U-00000080 - U-000007FF, mask 110XXXXX
   $d+=2;
   break;
  case(($ord_var_c & 0xF0) == 0xE0):
   // characters U-00000800 - U-0000FFFF, mask 1110XXXX
   $d+=3;
   break;
  case(($ord_var_c & 0xF8) == 0xF0):
   // characters U-00010000 - U-001FFFFF, mask 11110XXX
   $d+=4;
   break;
  case(($ord_var_c & 0xFC) == 0xF8):
   // characters U-00200000 - U-03FFFFFF, mask 111110XX
   $d+=5;
   break;
  case(($ord_var_c & 0xFE) == 0xFC):
   // characters U-04000000 - U-7FFFFFFF, mask 1111110X
   $d+=6;
   break;
   default:
   $d++;
  };
 };
 return $d;
}

This string is useful if you want to know how large a string is in bytes, but have only a small amount of control over how the string will be presented. For example, if you download a web page and want to know how large it is in bytes you can pass the content of the page into this function.

You might think that the Content-Length header could be used here, but you can't rely on this header to be returned from every site. Some sites will simply omit the line, whilst others will just put a default amount there.

Comments

It returned the string lenght not the size in byte.

nice try though.

Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.