Monday, April 30, 2012

UTF-8 Regular Expressions in PHP

While PHP itself doesn't know about different character sets and treats all characters as being one byte long, the PCRE engine understands UTF-8. There's also mb_ereg_match(), but I prefer the PCRE functions (preg_...). Here's a piece of code to see if your PHP was compiled with PCRE UTF-8 support.

$str = 'ありがとう';
echo "strlen('$str') = " . strlen($str) . "\n";
echo "preg_match_all('/./', '$str', \$matches) = " .
  preg_match_all('/./', $str, $matches) . "\n";
echo "preg_match_all('/(*UTF8)./u', '$str', \$matches) = " .
  preg_match_all('/(*UTF8)./u', $str, $matches) . "\n";

Which outputs the correct length of 5 characters when you start your regular expresssion with (*UTF8) and use the /u modifier.

strlen('ありがとう') = 15
preg_match_all('/./', 'ありがとう', $matches) = 15
preg_match_all('/(*UTF8)./u', 'ありがとう', $matches) = 5

You can also use Unicode character properties to match only letters (in any language) for example:

// The WRONG way to do it, only works for ASCII:
preg_match_all('/[a-zA-Z]/', $str, $matches);

// This way it works with any language:
preg_match_all('/(*UTF8)\p{L}/u', $str, $matches);

You can see other Unicode character properties in the PHP Manual.

Wednesday, April 25, 2012

Logging fatal PHP errors

If you turned off the display_errors setting in your php.ini in production (as you should), then when your code dies with a fatal error, you can't see the message anywhere. It would be better to log these errors to the Apache error log (this is true even if you didn't disable display_errors, for debugging errors that other users might report.) PHP has a log_errors directive in php.ini, but it doesn't seem to log anything for me. Instead, I used register_shutdown_function() to make PHP log the errors:

register_shutdown_function(function() {
  $error = error_get_last();
  if($error !== NULL) {
  error_log('PHP Fatal: file:' . $error['file'] . ' line:' . $error['line'] .
            ' type:' . $error['type'] . ' message:' . $error['message']);

This causes PHP to log the error on shutdown.