Upgraded php from 5.3.10 to 7.0.15 can't deal with non-English characters?

I have ported some software using Apache2, MySQL and php from a machine running Ubuntu 12.04 to a machine running Ubuntu Mate 16.04.

Versions:

I have copied a MySQL database from the old machine to the new machine.
The copy was exported as a .sql dump of the database. I sent it to the new machine via email.
It contains commands to create tables and insert data.
The data contains lots of Spanish etc accented characters etc.

On the new machine running Ubuntu Mate 16.04:

If I look at the file in Pluma text editor, the accents look OK.
Pluma says the encoding in "Unicode (UTF-8)"
If I cat the file in a terminal window, the accents look OK.
If I open the file in Libre Office, the accents look OK.

The MySQL server was created on the new machine.
The MySQL server says

    Database server
    Server: Localhost via UNIX socket
    Server type: MySQL
    Server version: 5.7.18-0ubuntu0.16.04.1 - (Ubuntu)
    Protocol version: 10
    User: root@localhost
    Server charset: UTF-8 Unicode (utf8)

A database was created in MySQL on the new machine and the .sql file imported
The text fields are varchar.

The CREATE TABLE command in the .sql file includes
DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
If I look at the data with phpmyadmin, the accents look OK.

I mention all these occasions where the data looks OK because searching the Internet for solutions to my problem, I see suggestions that these sorts of problems are caused by text stored in the wrong format.

On the new machine:

/etc/apache2/conf-enabled/charset.conf
contains
#AddDefaultCharset UTF-8

The html produced by the php contains
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

The code on the new machine is updated to use mysqli.

The copied code used
echo htmlentities($result_row[0]);
On the new machine, nothing would be output if the item contained accents.

Both the following pieces of php code show the accents incorrectly: José becomes Jos�

echo htmlentities($result_row[0], ENT_QUOTES | ENT_SUBSTITUTE, "UTF-8");
echo result_row[0];

On the old machine:

the corresponding php code
echo htmlentities($result_row[0]);
shows the accents OK

Out of curiosity I tried this code
echo result_row[0];
and was surprised to find that it shows the accents incorrectly: José becomes Jos�

So it looks like htmlentities did what I expected in version 5.3.10 of php but it no longer does in version 7.0.15.

I would welcome a solution. Looking at the Internet I'm a bit overwhelmed by the masses of conflicting advice for dealing with the problem of getting php to output non-English text.

I would start by having a look at PHP’s configuration. I would have to say the difference is there, 5.3.x having been configured for UTF-8 and 7.0.15 probably having been configured for something else.

Does this mean there’s a PHP 5.3.10 in the MATE repositories?

Last time i checked there was no PECL version of ncurses available for PHP-7. Since i have lots of ncurses PHP code, i’d like to install PHP 5.3.10 but haven’t figure out how to do that without building the thing from source.

[quote=“crankypuss, post:3, topic:13285”]
Does this mean there’s a PHP 5.3.10 in the MATE repositories?
[/quote]Not necessarily. I installed that system from an Ubuntu 12.04 LTS DVD.

1 Like

Yes, I think it is likely to come from php.

Just in case the problem came from the original data (which originally came from and was converted from MacOS 9) (but surely sending it successfully by email as UTF-8 would sort that out?), I entered some new data into the database via phpmyadmin.

I put this text
Text in English ¿en español? عربى 中文
into a varchar(100) field.

It gets shown correctly by phpmyadmin when I browse the row.
However my php software shows it as
Text in English �en espa�ol? ???? ??
in Firefox.

Interesting that the Spanish special characters get shown as � while the arabic and chinese are shown as ordinary question marks.

I tried setting the mbstring variables explicitly, following the advice in www.knowledgebase-script.com.

Before, phpinfo() shows

I put this text into php.ini

mbstring.language = all
mbstring.internal_encoding = UTF-8
mbstring.http_input = auto
mbstring.http_output = UTF-8
mbstring.encoding_translation = On
mbstring.detect_order = UTF-8
mbstring.substitute_character = none;
mbstring.func_overload = 0
mbstring.strict_encoding = Off

and reloaded apache2.

Now, phpinfo() shows

It doesn't help. :frowning:

I think I have solved it. When I updated my original php code, I just replaced my original connect and query calls with
$dsn = mysqli_connect() and
mysqli_query($dsn,$sql)

Apparently there is a new function I should be using as well:
$dsn->set_charset(‘utf8’);

Adding this after the connect makes it all work.