We use Zend Lucene Search on a PHP ecommerce website and ran into some issues where records added to the search index from the website interface werenāt the same as when created from the command line. It turned out to be a locale issue and setting the locale fixed the problem.
Error messages
A clue that something wasnāt working correctly was this error message:
iconv(): Detected an illegal character in input string
There are a variety of conversions that can be done which fix this error, but our problem appears to have been caused by a non-configured locale, defaulting to āCā and which does not support UTF8.
Check what locale is currently being used
In PHP, you can call āsetlocale(LC_ALL, 0)ā to find out what the locale is currently set as. Running through Nginx with PHP-FPM, it output this:
C
and from the command line this:
en_NZ.UTF-8
Running ālocaleā from a SSH terminal session output this:
LANG=en_NZ.UTF-8 LANGUAGE= LC_CTYPE="en_NZ.UTF-8" LC_NUMERIC="en_NZ.UTF-8" LC_TIME="en_NZ.UTF-8" LC_COLLATE="en_NZ.UTF-8" LC_MONETARY="en_NZ.UTF-8" LC_MESSAGES="en_NZ.UTF-8" LC_PAPER="en_NZ.UTF-8" LC_NAME="en_NZ.UTF-8" LC_ADDRESS="en_NZ.UTF-8" LC_TELEPHONE="en_NZ.UTF-8" LC_MEASUREMENT="en_NZ.UTF-8" LC_IDENTIFICATION="en_NZ.UTF-8" LC_ALL=
which would indicate the CLI script is picking the locale from the system, but Nginx/PHP-FPM is not.
How to fix
Itās possible to set the default locale in the php.ini file, although it might not be a good idea if you run many websites on your server as it could cause issues.
Instead use the setlocale() to set it specifically for your website. Check out theĀ www.php.net/setlocale manual page for more information about the function.
In my case, we did this:
setlocale(LC_ALL, 'en_NZ.UTF-8');
This then solved the issue with adding documents to the Lucene index.