Correcting Wrong Character Encoding In MySQL

Sometimes, especially when moving data from one server to another, you might find that you have encoded your MySQL database incorrectly. This problem with first show itself if you have the database encoded in one charset and your website set to display in another. If this is the case then you will find strange characters appearing in your text, especially when using punctuation marks. If you are unable or unwilling to change the character encoding on the site then you need to change how the data is encoded in the database.

The most common sort of thing you might want to do is change from iso-8859-1 (or windows-1252) to UTF-8. This can be done in one of two ways.

The first way is to simply alter the table so that the column contains a different charset.

ALTER TABLE table MODIFY col1 VARCHAR(50) CHARACTER SET 'utf8';

However, if your database has already been set up and your data has already been inserted in the wrong format then you can also update the data in the column using the CONVERT command. The following snippet turns our latin1 data into uncoded binary data and then into utf8.

UPDATE table SET col1=CONVERT(CONVERT(CONVERT(col1 USING 'latin1') USING BINARY) USING 'utf8');

You should also make sure that the connection to the database is done through a specific character set. This is done by using the SET NAMES command and the SET CHARACTER SET.

SET NAMES 'charset_name'
SET CHARACTER SET 'charset_name';

These two commands basically set some values in your MySQL database, for more information on what is set look at the Connection Character Sets and Collations page on the MySQL website. This ensures that the data we get back from the database is also in the correct charset.

For a full list of the different character sets available in MySQL just run the command:

SHOW CHARACTER SET;

This will display a table with the columns Charset, Description, Default collation and Maxlen. Each charset is associated with a collation. A collation is a set of rules for comparing characters in a charset, so it is important that you get this right if you want the database to work. The full list of collations can be viewed using the following command:

SHOW COLLATION;

You can even use a LIKE statement to refine the collation data into the information you are looking for.

SHOW COLLATION WHERE Charset LIKE '%utf%'

Comments

Awesome! I had issue with spanish words, and thie method helped.
Permalink

Thanks! I've had latin1 text in a utf-8 column and this fixed the issue.

Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
2 + 4 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.