[OpenSIPS-Users] UTF8 in MySQL database

Jacek Konieczny jajcus at jajcus.net
Wed Mar 25 09:03:35 CET 2009


On Wed, Mar 25, 2009 at 08:15:51AM +0200, Dan Pascu wrote:
> On Tuesday 24 March 2009, Jacek Konieczny wrote:
> > As I was just doing upgrade from OpenSIPs 1.4.4 to 1.5.0 I took a look
> > at the database and found out it was latin1-encode. I didn't like it
> > much (if any non ASCII characters are supposed to be allowed in the
> > database then why should it be limited to only a few languages in the
> > world?).
> 
> Latin-1 is 8 bit transparent. So you can throw at it whatever you like and 
> it will get you back exactly what you put in, without having to worry 
> what the input encoding is.

... as long as every application connected to this database is
enconding-ignorant and would treat 'latin1' as just a binary string. And
that is not a sane way to do things. The problems will start as soon as
someone will try to process this data as 'latin1' (according to the
declaration on the database), when it is not latin1.

> UTF-8 requires you that your input in already formatted UTF-8.

But then you know what you have in the database.

> > What is the reason for rejecting UTF8? No other setting seems to make
> > much sense in an international environment.
> 
> I disagree. Many other settings make sense in an international environment 
> and latin-1 is the most transparent of them.

Then why don't we drop all primary keys, foreign keys and other
constraints from the database? Then it would be even more transparent --
we could put anything there.  Putting non-latin1 strings in a "latin1"
table is like putting non-unique values in an UNIQUE column. Even if the
RDBMS in use would not care, it won't seem right.

But, back to my original question, as can understand that 'latin1' is ok
for some or even most people. My my question was: is there any specific,
technical reason, that 'utf8' is forbidden? I don't think OpenSIPs does
any SQL queries when multibyte strings would be a problem (like asking
for 3 characters only and expecting 3 bytes in the result). I don't know
MySQL well (just forced to use it because of CDRTool limitations), but I
don't think it would cause any serious problems with that, either.

Greets,
        Jacek



More information about the Users mailing list