<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Correcting Corrupted Characters</title>
	<atom:link href="http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/feed/" rel="self" type="application/rss+xml" />
	<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/</link>
	<description>Things that Eric A. Meyer, CSS expert, writes about on his personal Web site; it&#039;s largely Web standards and Web technology, but also various bits of culture, politics, personal observations, and other miscellaneous stuff</description>
	<lastBuildDate>Fri, 19 Mar 2010 00:27:46 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Eran Galperin</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-491729</link>
		<dc:creator>Eran Galperin</dc:creator>
		<pubDate>Thu, 04 Feb 2010 17:54:15 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-491729</guid>
		<description>I&#039;m not sure if this is still relevant, but since I didn&#039;t see any mention of this in the other comments, and going by character set details you posted, the issue is probably in the connection character set / collation. 

It&#039;s a common issue that MySQL selects an inappropriate connection collation, regardless of the headers in the HTTP request (those are irrelevant, since it is the PHP script that connects to the database). You can either force the connection to UTF8 in the MySQL configuration, or issue two queries on every queries that set the connection to UTF.

Those would be:
SET CHARACTER SET UTF8;
SET NAMES UTF8;

You can read on those on the MySQL docs -
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html</description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure if this is still relevant, but since I didn&#8217;t see any mention of this in the other comments, and going by character set details you posted, the issue is probably in the connection character set / collation. </p>
<p>It&#8217;s a common issue that MySQL selects an inappropriate connection collation, regardless of the headers in the HTTP request (those are irrelevant, since it is the PHP script that connects to the database). You can either force the connection to UTF8 in the MySQL configuration, or issue two queries on every queries that set the connection to UTF.</p>
<p>Those would be:<br />
SET CHARACTER SET UTF8;<br />
SET NAMES UTF8;</p>
<p>You can read on those on the MySQL docs -<br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Johan Sand</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-489372</link>
		<dc:creator>Johan Sand</dc:creator>
		<pubDate>Thu, 07 Jan 2010 22:34:19 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-489372</guid>
		<description>and for Friday Fun - if you want to update the entire database including all potentially affected records in all relevant fields in all tables, then this would be a crazy kenobi option.

This time only amend db host, user, pass and name.

Again, upload to site and run through firefox as is.

ps. make sure there&#039;s enough execution time for php to wrap it up.

&lt;code&gt;
&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; xml:lang=&quot;en&quot; lang=&quot;en&quot;&gt;
&lt;head&gt;
	&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html;charset=utf-8&quot; /&gt;
	&lt;meta name=&quot;uid&quot; content=&quot;10&quot; /&gt;
&lt;/head&gt;
&lt;body&gt;

&lt;?php

$db_host = &#039;host&#039;;
$db_user = &#039;user&#039;;
$db_pass = &#039;pass&#039;;
$db_name = &#039;name&#039;;

$DB = new mysqli($db_host, $db_user, $db_pass, $db_name);

$field_types = array(&#039;varchar&#039;,&#039;text&#039;,&#039;tinytext&#039;,&#039;longtext&#039;);

if ($res_tables = $DB-&gt;query (&quot;SHOW TABLES&quot;)) {
    while ($tables = $res_tables-&gt;fetch_array(MYSQLI_NUM) ) {

        if ($res_fields = $DB-&gt;query (&quot;SHOW COLUMNS FROM &quot;.$tables[0])) {

            if ($res_key = $DB-&gt;query (&quot;SHOW COLUMNS FROM &quot;.$tables[0].&quot; WHERE `Key` LIKE &#039;PRI&#039;&quot;)) {
                $key = $res_key-&gt;fetch_assoc();
                $unique_key = $key[&#039;Field&#039;];
            }

            while ($fields = $res_fields-&gt;fetch_array(MYSQLI_ASSOC) ) {
                if (in_array($fields[&#039;Type&#039;], $field_types)) {

                    $DB-&gt;query(&quot;SET NAMES latin1&quot;);

                    if ($res = $DB-&gt;query (&quot;SELECT &quot;.$unique_key.&quot;, &quot;.$fields[&#039;Field&#039;].&quot; FROM &quot;.$tables[0].&quot; WHERE 1=1&quot;)) {
                        while ($data = $res-&gt;fetch_object() ) {

                            $DB-&gt;query(&quot;SET NAMES utf8;&quot;);

                            $unique_field = $data-&gt;$unique_key;
                            $fix_field = bin2hex($data-&gt;$fields[&#039;Field&#039;]);

                            $result = $DB-&gt;query (&quot;
                                UPDATE &quot;.$tables[0].&quot;
                                SET &quot;.$fields[&#039;Field&#039;].&quot; = UNHEX(&#039;&quot;.$DB-&gt;real_escape_string($fix_field).&quot;&#039;)
                                WHERE &quot;.$unique_key.&quot; = &#039;&quot;.$unique_field.&quot;&#039;
                            &quot;);

                            unset($unique_field);
                            unset($fix_field);

                        }
                    }

                }
            }

        }
        unset($key);
        unset($unique_key);

    }
}

?&gt;
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>and for Friday Fun &#8211; if you want to update the entire database including all potentially affected records in all relevant fields in all tables, then this would be a crazy kenobi option.</p>
<p>This time only amend db host, user, pass and name.</p>
<p>Again, upload to site and run through firefox as is.</p>
<p>ps. make sure there&#8217;s enough execution time for php to wrap it up.</p>
<p><code><br />
&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;<br />
&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"&gt;<br />
&lt;head&gt;<br />
	&lt;meta http-equiv="Content-Type" content="text/html;charset=utf-8" /&gt;<br />
	&lt;meta name="uid" content="10" /&gt;<br />
&lt;/head&gt;<br />
&lt;body&gt;</p>
<p>&lt;?php</p>
<p>$db_host = 'host';<br />
$db_user = 'user';<br />
$db_pass = 'pass';<br />
$db_name = 'name';</p>
<p>$DB = new mysqli($db_host, $db_user, $db_pass, $db_name);</p>
<p>$field_types = array('varchar','text','tinytext','longtext');</p>
<p>if ($res_tables = $DB-&gt;query ("SHOW TABLES")) {<br />
    while ($tables = $res_tables-&gt;fetch_array(MYSQLI_NUM) ) {</p>
<p>        if ($res_fields = $DB-&gt;query ("SHOW COLUMNS FROM ".$tables[0])) {</p>
<p>            if ($res_key = $DB-&gt;query ("SHOW COLUMNS FROM ".$tables[0]." WHERE `Key` LIKE 'PRI'")) {<br />
                $key = $res_key-&gt;fetch_assoc();<br />
                $unique_key = $key['Field'];<br />
            }</p>
<p>            while ($fields = $res_fields-&gt;fetch_array(MYSQLI_ASSOC) ) {<br />
                if (in_array($fields['Type'], $field_types)) {</p>
<p>                    $DB-&gt;query("SET NAMES latin1");</p>
<p>                    if ($res = $DB-&gt;query ("SELECT ".$unique_key.", ".$fields['Field']." FROM ".$tables[0]." WHERE 1=1")) {<br />
                        while ($data = $res-&gt;fetch_object() ) {</p>
<p>                            $DB-&gt;query("SET NAMES utf8;");</p>
<p>                            $unique_field = $data-&gt;$unique_key;<br />
                            $fix_field = bin2hex($data-&gt;$fields['Field']);</p>
<p>                            $result = $DB-&gt;query ("<br />
                                UPDATE ".$tables[0]."<br />
                                SET ".$fields['Field']." = UNHEX('".$DB-&gt;real_escape_string($fix_field)."')<br />
                                WHERE ".$unique_key." = '".$unique_field."'<br />
                            ");</p>
<p>                            unset($unique_field);<br />
                            unset($fix_field);</p>
<p>                        }<br />
                    }</p>
<p>                }<br />
            }</p>
<p>        }<br />
        unset($key);<br />
        unset($unique_key);</p>
<p>    }<br />
}</p>
<p>?&gt;<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Johan Sand</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-489324</link>
		<dc:creator>Johan Sand</dc:creator>
		<pubDate>Thu, 07 Jan 2010 08:11:30 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-489324</guid>
		<description>To make a very long story short - this is what you need to do:

- Amend DB specific entries (host, user, pass, db, fields and table).
- Don&#039;t fiddle with the rest of the code.
- Upload to browsable part of your web server/site.
- Call the &quot;page&quot; from firefox.
- Wait (and don&#039;t reload) until complete.

If you need any of the code explained, feel free to drop me an email.

hth, cheers.
/j.

ps. the code tag strips brackets, which is a bit annoying (converted to lt&#124;gt)...

&lt;code&gt;
&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; xml:lang=&quot;en&quot; lang=&quot;en&quot;&gt;
&lt;head&gt;
	&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html;charset=utf-8&quot; /&gt;
	&lt;meta name=&quot;uid&quot; content=&quot;10&quot; /&gt;
&lt;/head&gt;
&lt;body&gt;

&lt;?php

$DB = new mysqli(&#039;host&#039;,&#039;user&#039;,&#039;pass&#039;,&#039;db&#039;);
$DB-&gt;query(&quot;SET NAMES latin1&quot;);

if ($res = $DB-&gt;query (&quot;SELECT unique_field, fix_field_1, fix_field_2, fix_field_3, fix_field_4 FROM fix_table WHERE 1=1&quot;)) {

  echo &#039;rows: &#039;.$res-&gt;num_rows;
  $cnt = 0;

  while ($data = $res-&gt;fetch_object() ) {

    $DB-&gt;query(&quot;SET NAMES utf8;&quot;);

    $unique_field = $data-&gt;unique_field;
    $fix_field_1 = bin2hex($data-&gt;fix_field_1);
    $fix_field_2 = bin2hex($data-&gt;fix_field_2);
    $fix_field_3 = bin2hex($data-&gt;fix_field_3);
    $fix_field_4 = bin2hex($data-&gt;fix_field_4);

    $result = $DB-&gt;query (&quot;
        UPDATE fix_table
        SET
            fix_field_1 = UNHEX(&#039;&quot;.$DB-&gt;real_escape_string($fix_field_1).&quot;&#039;),
            fix_field_2 = UNHEX(&#039;&quot;.$DB-&gt;real_escape_string($fix_field_2).&quot;&#039;),
            fix_field_3 = UNHEX(&#039;&quot;.$DB-&gt;real_escape_string($fix_field_3).&quot;&#039;),
            fix_field_4 = UNHEX(&#039;&quot;.$DB-&gt;real_escape_string($fix_field_4).&quot;&#039;)
            WHERE unique_field = &#039;&quot;.$unique_field.&quot;&#039;&quot;);

    echo $cnt.&quot; - &quot;.$unique_field.&quot;&lt;br /&gt;&quot;;

    unset($unique_field);
    unset($fix_field_1);
    unset($fix_field_2);
    unset($fix_field_3);
    unset($fix_field_4);

    echo $DB-&gt;error;

    $cnt++;
    }
}

?&gt;
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>To make a very long story short &#8211; this is what you need to do:</p>
<p>- Amend DB specific entries (host, user, pass, db, fields and table).<br />
- Don&#8217;t fiddle with the rest of the code.<br />
- Upload to browsable part of your web server/site.<br />
- Call the &#8220;page&#8221; from firefox.<br />
- Wait (and don&#8217;t reload) until complete.</p>
<p>If you need any of the code explained, feel free to drop me an email.</p>
<p>hth, cheers.<br />
/j.</p>
<p>ps. the code tag strips brackets, which is a bit annoying (converted to lt|gt)&#8230;</p>
<p><code><br />
&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;<br />
&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"&gt;<br />
&lt;head&gt;<br />
	&lt;meta http-equiv="Content-Type" content="text/html;charset=utf-8" /&gt;<br />
	&lt;meta name="uid" content="10" /&gt;<br />
&lt;/head&gt;<br />
&lt;body&gt;</p>
<p>&lt;?php</p>
<p>$DB = new mysqli('host','user','pass','db');<br />
$DB-&gt;query("SET NAMES latin1");</p>
<p>if ($res = $DB-&gt;query ("SELECT unique_field, fix_field_1, fix_field_2, fix_field_3, fix_field_4 FROM fix_table WHERE 1=1")) {</p>
<p>  echo 'rows: '.$res-&gt;num_rows;<br />
  $cnt = 0;</p>
<p>  while ($data = $res-&gt;fetch_object() ) {</p>
<p>    $DB-&gt;query("SET NAMES utf8;");</p>
<p>    $unique_field = $data-&gt;unique_field;<br />
    $fix_field_1 = bin2hex($data-&gt;fix_field_1);<br />
    $fix_field_2 = bin2hex($data-&gt;fix_field_2);<br />
    $fix_field_3 = bin2hex($data-&gt;fix_field_3);<br />
    $fix_field_4 = bin2hex($data-&gt;fix_field_4);</p>
<p>    $result = $DB-&gt;query ("<br />
        UPDATE fix_table<br />
        SET<br />
            fix_field_1 = UNHEX('".$DB-&gt;real_escape_string($fix_field_1)."'),<br />
            fix_field_2 = UNHEX('".$DB-&gt;real_escape_string($fix_field_2)."'),<br />
            fix_field_3 = UNHEX('".$DB-&gt;real_escape_string($fix_field_3)."'),<br />
            fix_field_4 = UNHEX('".$DB-&gt;real_escape_string($fix_field_4)."')<br />
            WHERE unique_field = '".$unique_field."'");</p>
<p>    echo $cnt." - ".$unique_field."&lt;br /&gt;";</p>
<p>    unset($unique_field);<br />
    unset($fix_field_1);<br />
    unset($fix_field_2);<br />
    unset($fix_field_3);<br />
    unset($fix_field_4);</p>
<p>    echo $DB-&gt;error;</p>
<p>    $cnt++;<br />
    }<br />
}</p>
<p>?&gt;<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Josue Rodriguez</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-489287</link>
		<dc:creator>Josue Rodriguez</dc:creator>
		<pubDate>Wed, 06 Jan 2010 23:29:07 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-489287</guid>
		<description>This powerful but simple perl script is marvelous to convert your MySQL database charsets to UTF8 quick and easy. I use it every time.

&lt;a href=&quot;http://www.pablowe.net/convert_charset&quot; rel=&quot;nofollow&quot;&gt;http://www.pablowe.net/convert_charset&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>This powerful but simple perl script is marvelous to convert your MySQL database charsets to UTF8 quick and easy. I use it every time.</p>
<p><a href="http://www.pablowe.net/convert_charset" rel="nofollow">http://www.pablowe.net/convert_charset</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andreas Lagerkvist</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-488804</link>
		<dc:creator>Andreas Lagerkvist</dc:creator>
		<pubDate>Sat, 02 Jan 2010 11:31:18 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-488804</guid>
		<description>I&#039;m not sure if this helps, and I know some people already pointed some of it out, but I recently converted my DB to UTF-8 and this is what I did:

1. mysqldump the whole thing to a file
2. Add a special character (like &quot;Ö&quot;) to said file that looks good in the editor
3. Open the file with Firefox and check which encoding is used when the &quot;Ö&quot; looks ok (to find out exactly what encoding the file is)
4. Run iconv on the file to actually convert it to UTF-8 (from whatever encoding Firefox said it was)
5. Manually convert bad characters to good ones (and change potential encoding=latin1-settings in the sql-file to utf8)
6. Create new database where everything is UTF-8
7. Import the new, clean, utf8 SQL

That worked for me at least and I&#039;ve had problems with encodings as far as I can remember.

I think one important bit I didn&#039;t see in the comments (although it may have been mentioned) is to not only convert the characters but also convert the actual file (which I used iconv for).</description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure if this helps, and I know some people already pointed some of it out, but I recently converted my DB to UTF-8 and this is what I did:</p>
<p>1. mysqldump the whole thing to a file<br />
2. Add a special character (like &#8220;Ö&#8221;) to said file that looks good in the editor<br />
3. Open the file with Firefox and check which encoding is used when the &#8220;Ö&#8221; looks ok (to find out exactly what encoding the file is)<br />
4. Run iconv on the file to actually convert it to UTF-8 (from whatever encoding Firefox said it was)<br />
5. Manually convert bad characters to good ones (and change potential encoding=latin1-settings in the sql-file to utf8)<br />
6. Create new database where everything is UTF-8<br />
7. Import the new, clean, utf8 SQL</p>
<p>That worked for me at least and I&#8217;ve had problems with encodings as far as I can remember.</p>
<p>I think one important bit I didn&#8217;t see in the comments (although it may have been mentioned) is to not only convert the characters but also convert the actual file (which I used iconv for).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Björklund</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-486762</link>
		<dc:creator>Emil Björklund</dc:creator>
		<pubDate>Sat, 12 Dec 2009 10:01:09 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-486762</guid>
		<description>After reading this article + comment thread, I&#039;ve decided that the easiest solution to these pesky characted encoding problems is if I just change my name. 

I was thinking maybe Emil Borkedchar?</description>
		<content:encoded><![CDATA[<p>After reading this article + comment thread, I&#8217;ve decided that the easiest solution to these pesky characted encoding problems is if I just change my name. </p>
<p>I was thinking maybe Emil Borkedchar?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike D.</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-486262</link>
		<dc:creator>Mike D.</dc:creator>
		<pubDate>Mon, 07 Dec 2009 02:15:30 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-486262</guid>
		<description>&quot;This whole post (and the comments) in my opinion demonstrates quite well why you should not trust your data to a database.&quot;

Funny.

Seriously though, you&#039;re probably already going to do this but please post a follow-up post with an overview of the problem and the eventual solution, when you find it. Going through all of these comments makes me feel like a total N00000000B. This has happened to me in WordPress a couple of times and each time I&#039;ve just done manual search-and-replace for the characters I know about.</description>
		<content:encoded><![CDATA[<p>&#8220;This whole post (and the comments) in my opinion demonstrates quite well why you should not trust your data to a database.&#8221;</p>
<p>Funny.</p>
<p>Seriously though, you&#8217;re probably already going to do this but please post a follow-up post with an overview of the problem and the eventual solution, when you find it. Going through all of these comments makes me feel like a total N00000000B. This has happened to me in WordPress a couple of times and each time I&#8217;ve just done manual search-and-replace for the characters I know about.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aeron Glemann</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-485985</link>
		<dc:creator>Aeron Glemann</dc:creator>
		<pubDate>Fri, 04 Dec 2009 13:32:09 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-485985</guid>
		<description>I&#039;ve had to deal with this a bunch of times.... what I do - and it&#039;s always worked for me - is 1st do a dump. Then - assuming you&#039;re on Mac or Linux - run from the commandline:

iconv -f latin1 -t utf8 myDump.sql &gt; myDumpUTF8.sql

Reimport....</description>
		<content:encoded><![CDATA[<p>I&#8217;ve had to deal with this a bunch of times&#8230;. what I do &#8211; and it&#8217;s always worked for me &#8211; is 1st do a dump. Then &#8211; assuming you&#8217;re on Mac or Linux &#8211; run from the commandline:</p>
<p>iconv -f latin1 -t utf8 myDump.sql &gt; myDumpUTF8.sql</p>
<p>Reimport&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Sharkey</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-485743</link>
		<dc:creator>Matt Sharkey</dc:creator>
		<pubDate>Wed, 02 Dec 2009 23:31:17 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-485743</guid>
		<description>Finally solved this problem for myself, using the method described in this post:

http://tlug.dnho.net/?q=node/276

Yes, it&#039;s another MySQL dump &amp; import procedure. Haven&#039;t checked for truncated content, but so far all my em &amp; en dashes look good.</description>
		<content:encoded><![CDATA[<p>Finally solved this problem for myself, using the method described in this post:</p>
<p><a href="http://tlug.dnho.net/?q=node/276" rel="nofollow">http://tlug.dnho.net/?q=node/276</a></p>
<p>Yes, it&#8217;s another MySQL dump &amp; import procedure. Haven&#8217;t checked for truncated content, but so far all my em &amp; en dashes look good.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wade Kwon</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-485281</link>
		<dc:creator>Wade Kwon</dc:creator>
		<pubDate>Sat, 28 Nov 2009 23:55:55 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-485281</guid>
		<description>Eric: Just quickly commenting to say that I&#039;m having the exact same problem of late, and will read through the comments and any updates from you on a workable solution. Tired of doing find/replace.</description>
		<content:encoded><![CDATA[<p>Eric: Just quickly commenting to say that I&#8217;m having the exact same problem of late, and will read through the comments and any updates from you on a workable solution. Tired of doing find/replace.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ash Searle</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484814</link>
		<dc:creator>Ash Searle</dc:creator>
		<pubDate>Tue, 24 Nov 2009 17:08:46 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-484814</guid>
		<description>@Eric,

I had to do this last year and blogged about it at the time.  I remember an early draft including instructions &quot;open vim and...&quot; - I quickly realised as soon as you get an editor involved you&#039;re fooked.  Fortunately, MySQL has a command-line tool for doing search-and-replace so you don&#039;t have to worry about editor settings or other random phenomena.  (the instructions are in my  &lt;a href=&quot;http://hexmen.com/blog/2008/07/mysql-latin1-utf8-wordpress-upgrade/&quot; rel=&quot;nofollow&quot;&gt;latin1 to utf8 conversion&lt;/a&gt; post)

BTW.  Using Safari 4.0.4 (the latest) on OS X, the encoding in this article looks fine, but the comments are screwed up.  Forcing the text-encoding to ISO Latin 1 fixes the comments, but borks the names of the commenters (e.g. Tantek Çelik)  I don&#039;t know how far you think you&#039;ve got fixing the issues, but it looks like there&#039;s some way to go...  (Note: using the web inspector / firebug  you can check document.characterSet for the displayed character-set - which is handy when you&#039;re checking you&#039;ve overridden the text-encoding via browser menus.)</description>
		<content:encoded><![CDATA[<p>@Eric,</p>
<p>I had to do this last year and blogged about it at the time.  I remember an early draft including instructions &#8220;open vim and&#8230;&#8221; &#8211; I quickly realised as soon as you get an editor involved you&#8217;re fooked.  Fortunately, MySQL has a command-line tool for doing search-and-replace so you don&#8217;t have to worry about editor settings or other random phenomena.  (the instructions are in my  <a href="http://hexmen.com/blog/2008/07/mysql-latin1-utf8-wordpress-upgrade/" rel="nofollow">latin1 to utf8 conversion</a> post)</p>
<p>BTW.  Using Safari 4.0.4 (the latest) on OS X, the encoding in this article looks fine, but the comments are screwed up.  Forcing the text-encoding to ISO Latin 1 fixes the comments, but borks the names of the commenters (e.g. Tantek Çelik)  I don&#8217;t know how far you think you&#8217;ve got fixing the issues, but it looks like there&#8217;s some way to go&#8230;  (Note: using the web inspector / firebug  you can check document.characterSet for the displayed character-set &#8211; which is handy when you&#8217;re checking you&#8217;ve overridden the text-encoding via browser menus.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeroen Pulles</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484616</link>
		<dc:creator>Jeroen Pulles</dc:creator>
		<pubDate>Sun, 22 Nov 2009 16:49:07 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-484616</guid>
		<description>I had the same or similar problem last year, with a client, where my Wordpress data got encoded to UTF-8 twice. I rolled &lt;a href=&quot;http://www.redslider.net/2009/creode/creode.py.html&quot; rel=&quot;nofollow&quot;&gt;my own script&lt;/a&gt; to &quot;double decode&quot; the binary mess in my SQL dump file back to some sane text with the script that is linked above. Perhaps that can be of any help, if you&#039;re the scripting kind of person.</description>
		<content:encoded><![CDATA[<p>I had the same or similar problem last year, with a client, where my Wordpress data got encoded to UTF-8 twice. I rolled <a href="http://www.redslider.net/2009/creode/creode.py.html" rel="nofollow">my own script</a> to &#8220;double decode&#8221; the binary mess in my SQL dump file back to some sane text with the script that is linked above. Perhaps that can be of any help, if you&#8217;re the scripting kind of person.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kim Sullivan</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484613</link>
		<dc:creator>Kim Sullivan</dc:creator>
		<pubDate>Sun, 22 Nov 2009 15:10:36 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-484613</guid>
		<description>comments: tl;dr, just that I&#039;ve had exactly zero succes with dumping the database (via phpMyAdmin) and reimporting.

I&#039;ve ran across this many times, the problem is that MySQL is encoding aware, and once you get bogus data in the database, no amount of recoding between different encodings or setting &quot;set names&quot; will help (in fact, the worst thing you can do is try to repair it by simply setting the correct encoding for your tables - if the target encoding doesn&#039;t have a glyph, &lt;em&gt;it gets irreversibly replaced by ?&lt;/em&gt;).

There&#039;s a simple workaround that worked for me many times.

First, change the database column type  from TEXT (or varchar) to BLOB (or VARBINARY). This makes MySQL &quot;forget&quot; about any encoding it thinks the data is in, and prevents any recoding of the data that goes around behind the scenes (in my case, the data was often encoded in CP1250 or UTF-8, but column encoding was set to LATIN1).

Then you have to find out in what encoding the data is in, and reset the column type to text/varchar/char with the encoding that matches the physical encoding of the data (in my case often CP1250). Once the physical encoding in the database matches the &quot;logical&quot; encoding of the columns, it&#039;s possible to simply change the encoding of the columns (to UTF-8), and with the correct SET NAMES, you can have your webpage output anything you want (from UTF-8 to LATIN1).

When the data is double encoded (it originally was in UTF-8, it got reimported into the database as latin1 and then the encoding of the columns changed to UTF-8), you first have to set the encoding of the columns back to what it was when it was imported - this changes doubly encoded UTF-8 to physically singly encoded UTF-8 that the database thinks is in LATIN1 (for example), and then you go the route from TEXT (latin1) -&gt; BLOB -&gt; TEXT(UTF-8).

I think I have seen some scripts that try to do this automatically (by being really smart and getting information from the data dictionary), but for smaller scale databases such as wordpress, doing everything manually might be more tedious, but I think it&#039;s safer.

A few short points:
1. It is vital to get the physical encoding to match the encoding that is set in the table column type (I&#039;m not sure if search and replace will help because it works on already encoded data)
2. The encoding that is set in the HTML pages only determines what encoding the browser sends to PHP
3. PHP doesn&#039;t know (or, unfortunately, care) what encoding you get from the browser. GIGO.
4. The MySQL cares about the encoding of the data from the browser (and what encoding it sends back). Use &quot;SET ENCODING&quot; SQL command to tell the database this information (AFAIK, WP does this).
5. The database performs a lot of conversion behind the scenes - if the database thinks you send it data in latin1, but you have tables columns in UTF-8, it WILL do a conversion from &lt;em&gt;latin1 to utf-8&lt;/em&gt;, even if the data already was in UTF-8 (or worse, cp1250).
6. Changing the encoding of a column from one encoding to another performs physical recoding of the data, so you have to roundtrip it via BLOB or BINARY.
7. Once you try to convert two incompatible encodings, MySQL will insert a question mark (physically) for every character it can&#039;t convert (happens for example when changing between CP1250 and LATIN1, or importing UTF-8 data as UTF-8 data in table columns that have their encoding set to LATIN1).</description>
		<content:encoded><![CDATA[<p>comments: tl;dr, just that I&#8217;ve had exactly zero succes with dumping the database (via phpMyAdmin) and reimporting.</p>
<p>I&#8217;ve ran across this many times, the problem is that MySQL is encoding aware, and once you get bogus data in the database, no amount of recoding between different encodings or setting &#8220;set names&#8221; will help (in fact, the worst thing you can do is try to repair it by simply setting the correct encoding for your tables &#8211; if the target encoding doesn&#8217;t have a glyph, <em>it gets irreversibly replaced by ?</em>).</p>
<p>There&#8217;s a simple workaround that worked for me many times.</p>
<p>First, change the database column type  from TEXT (or varchar) to BLOB (or VARBINARY). This makes MySQL &#8220;forget&#8221; about any encoding it thinks the data is in, and prevents any recoding of the data that goes around behind the scenes (in my case, the data was often encoded in CP1250 or UTF-8, but column encoding was set to LATIN1).</p>
<p>Then you have to find out in what encoding the data is in, and reset the column type to text/varchar/char with the encoding that matches the physical encoding of the data (in my case often CP1250). Once the physical encoding in the database matches the &#8220;logical&#8221; encoding of the columns, it&#8217;s possible to simply change the encoding of the columns (to UTF-8), and with the correct SET NAMES, you can have your webpage output anything you want (from UTF-8 to LATIN1).</p>
<p>When the data is double encoded (it originally was in UTF-8, it got reimported into the database as latin1 and then the encoding of the columns changed to UTF-8), you first have to set the encoding of the columns back to what it was when it was imported &#8211; this changes doubly encoded UTF-8 to physically singly encoded UTF-8 that the database thinks is in LATIN1 (for example), and then you go the route from TEXT (latin1) -&gt; BLOB -&gt; TEXT(UTF-8).</p>
<p>I think I have seen some scripts that try to do this automatically (by being really smart and getting information from the data dictionary), but for smaller scale databases such as wordpress, doing everything manually might be more tedious, but I think it&#8217;s safer.</p>
<p>A few short points:<br />
1. It is vital to get the physical encoding to match the encoding that is set in the table column type (I&#8217;m not sure if search and replace will help because it works on already encoded data)<br />
2. The encoding that is set in the HTML pages only determines what encoding the browser sends to PHP<br />
3. PHP doesn&#8217;t know (or, unfortunately, care) what encoding you get from the browser. GIGO.<br />
4. The MySQL cares about the encoding of the data from the browser (and what encoding it sends back). Use &#8220;SET ENCODING&#8221; SQL command to tell the database this information (AFAIK, WP does this).<br />
5. The database performs a lot of conversion behind the scenes &#8211; if the database thinks you send it data in latin1, but you have tables columns in UTF-8, it WILL do a conversion from <em>latin1 to utf-8</em>, even if the data already was in UTF-8 (or worse, cp1250).<br />
6. Changing the encoding of a column from one encoding to another performs physical recoding of the data, so you have to roundtrip it via BLOB or BINARY.<br />
7. Once you try to convert two incompatible encodings, MySQL will insert a question mark (physically) for every character it can&#8217;t convert (happens for example when changing between CP1250 and LATIN1, or importing UTF-8 data as UTF-8 data in table columns that have their encoding set to LATIN1).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Meyer</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484440</link>
		<dc:creator>Eric Meyer</dc:creator>
		<pubDate>Sat, 21 Nov 2009 02:44:44 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-484440</guid>
		<description>Okay, now that&#039;s five recommendations to dump and re-import after I already tried that and it didn&#039;t work.  Made things much, much worse, in fact.

&lt;a href=&quot;http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484401&quot; rel=&quot;nofollow&quot;&gt;Tantek&lt;/a&gt;, I sort of agree with you, but there are things WP does for me that hand-rolling wouldn&#039;t provide.  Like comments, for example, which I am emphatically &lt;em&gt;not&lt;/em&gt; willing to outsource to a third-party cloud service; and which simply listing inbound links does not come close to replicating.  Perhaps there are solutions now that would do all this but not rely on a database, but I don&#039;t remember seeing any back in 2004.

&lt;a href=&quot;http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484409&quot; rel=&quot;nofollow&quot;&gt;Jeff&lt;/a&gt;, I believe that if I freshly installed WP in 2009, it would set things up using UTF-8 and there&#039;d be no issue.  I installed it almost six years ago, though.  Things have advanced a bit since then.</description>
		<content:encoded><![CDATA[<p>Okay, now that&#8217;s five recommendations to dump and re-import after I already tried that and it didn&#8217;t work.  Made things much, much worse, in fact.</p>
<p><a href="http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484401" rel="nofollow">Tantek</a>, I sort of agree with you, but there are things WP does for me that hand-rolling wouldn&#8217;t provide.  Like comments, for example, which I am emphatically <em>not</em> willing to outsource to a third-party cloud service; and which simply listing inbound links does not come close to replicating.  Perhaps there are solutions now that would do all this but not rely on a database, but I don&#8217;t remember seeing any back in 2004.</p>
<p><a href="http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484409" rel="nofollow">Jeff</a>, I believe that if I freshly installed WP in 2009, it would set things up using UTF-8 and there&#8217;d be no issue.  I installed it almost six years ago, though.  Things have advanced a bit since then.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philip Tellis</title>
		<link>http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comment-484413</link>
		<dc:creator>Philip Tellis</dc:creator>
		<pubDate>Sat, 21 Nov 2009 00:21:35 +0000</pubDate>
		<guid isPermaLink="false">http://meyerweb.com/eric/thoughts/?p=1214#comment-484413</guid>
		<description>This looks like a classic double-encoding problem.

ie, your data stored in your database is UTF-8, but something in between the database and your HTML output thinks that it&#039;s LATIN-1, so goes ahead and converts it to UTF-8.  The result is that multi-byte characters get treated as multiple single byte characters, each of which is converted into a single multi-byte UTF-8 character.

Skimming through your comments, it looks like your database is set to latin1.  Check if your tables are also set to latin1 or utf-8 (SHOW CREATE TABLE).

If they are latin1, then what you have to do is dump all your data out as latin1 or binary (so that it doesn&#039;t do any conversion).  You could just use mysqldump to do this and it will use the default character set.

After that, you need to delete all data in the database.  This can be done with by issuing a drop database.  This will break your website, and probably give the first visitor the option of creating a new blog, so you might want to turn off access to your blog while this is happening.

After you&#039;ve dropped the database, do two things.  In the /etc/my.cnf file, make sure the default character sets are set to utf8, you&#039;ll need to check the mysql website for the specific names they use for collation.  I think it&#039;s utf8_general_ci or utf8_unicode_ci.  Secondly, go through the dump file in a text editor and make sure none of the tables have the latin1 charset.  If they do, you need to change this as well (make a backup of the file first in case a typo gets in there).

Once done, recreate the DB using mysql &lt; dump-file.sql

After this, all your data should still be in utf8, but now mysql actually knows it&#039;s in utf8.  You can verify by doing a SHOW CREATE TABLE on all your tables and make sure they&#039;re set to utf8.

Now re-enable your blog and hope it works.

Note that Wordpress may have a way to dump the data and re-import it, but the dumped data may have the same problems that you&#039;re seeing.  One option is to maybe tell wordpress that you expect latin1 data in the db and that you want your output to also be latin1.  That way WP will ask MySQL for latin1 data (no conversion done) and will output latin1 data (no conversion done), and what you get is what is in the db.  Then before re-importing, change the charsets everywhere to utf8.

Note that I don&#039;t use WP, so have not tested this on my own with WP.

@Tantek: while I agree with you that users who don&#039;t want to be DBAs shouldn&#039;t be writing their data to a DB that they control, the problem here is neither the database&#039;s fault, nor the user&#039;s fault.  It appears to be a piece of software (WP?) making assumptions about the data and the database&#039;s character set.  One of those it controls, and the other is easy to check.</description>
		<content:encoded><![CDATA[<p>This looks like a classic double-encoding problem.</p>
<p>ie, your data stored in your database is UTF-8, but something in between the database and your HTML output thinks that it&#8217;s LATIN-1, so goes ahead and converts it to UTF-8.  The result is that multi-byte characters get treated as multiple single byte characters, each of which is converted into a single multi-byte UTF-8 character.</p>
<p>Skimming through your comments, it looks like your database is set to latin1.  Check if your tables are also set to latin1 or utf-8 (SHOW CREATE TABLE).</p>
<p>If they are latin1, then what you have to do is dump all your data out as latin1 or binary (so that it doesn&#8217;t do any conversion).  You could just use mysqldump to do this and it will use the default character set.</p>
<p>After that, you need to delete all data in the database.  This can be done with by issuing a drop database.  This will break your website, and probably give the first visitor the option of creating a new blog, so you might want to turn off access to your blog while this is happening.</p>
<p>After you&#8217;ve dropped the database, do two things.  In the /etc/my.cnf file, make sure the default character sets are set to utf8, you&#8217;ll need to check the mysql website for the specific names they use for collation.  I think it&#8217;s utf8_general_ci or utf8_unicode_ci.  Secondly, go through the dump file in a text editor and make sure none of the tables have the latin1 charset.  If they do, you need to change this as well (make a backup of the file first in case a typo gets in there).</p>
<p>Once done, recreate the DB using mysql &lt; dump-file.sql</p>
<p>After this, all your data should still be in utf8, but now mysql actually knows it&#039;s in utf8.  You can verify by doing a SHOW CREATE TABLE on all your tables and make sure they&#039;re set to utf8.</p>
<p>Now re-enable your blog and hope it works.</p>
<p>Note that Wordpress may have a way to dump the data and re-import it, but the dumped data may have the same problems that you&#039;re seeing.  One option is to maybe tell wordpress that you expect latin1 data in the db and that you want your output to also be latin1.  That way WP will ask MySQL for latin1 data (no conversion done) and will output latin1 data (no conversion done), and what you get is what is in the db.  Then before re-importing, change the charsets everywhere to utf8.</p>
<p>Note that I don&#039;t use WP, so have not tested this on my own with WP.</p>
<p>@Tantek: while I agree with you that users who don&#039;t want to be DBAs shouldn&#039;t be writing their data to a DB that they control, the problem here is neither the database&#039;s fault, nor the user&#039;s fault.  It appears to be a piece of software (WP?) making assumptions about the data and the database&#039;s character set.  One of those it controls, and the other is easy to check.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head profile="http://gmpg.org/xfn/1">
<title>meyerweb.com</title>
<link rel="openid.server" href="http://www.myopenid.com/server">
<link rel="openid.delegate" href="http://emeyer.myopenid.com/">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><link rel="shortcut icon" href="/favicon.ico"><link rel="home" href="http://meyerweb.com/" title="Home" ><link rel="stylesheet" href="http://meyerweb.com/ui/meyerweb.css" type="text/css" media="screen, projection"><link rel="stylesheet" href="http://meyerweb.com/ui/theme.css" type="text/css" media="screen, projection" id="themeLink"><link rel="stylesheet" href="http://meyerweb.com/ui/print.css" type="text/css" media="print"><script src="http://meyerweb.com/ui/addresses.js" type="text/javascript"></script><link rel="stylesheet" href="/ui/wordpress.css" type="text/css" media="screen">
<link rel="stylesheet" href="/ui/tfe.css" type="text/css" media="screen">
<link rel="stylesheet" href="/ui/home.css" type="text/css" media="screen">
<link rel="alternate" type="application/rss+xml" title="Thoughts From Eric" href="/eric/thoughts/rss2/full" />
<link rel="alternate" type="application/rss+xml" title="Thoughts From Eric (only technical posts)" href="/eric/thoughts/category/tech/rss2/full" />
<link rel="alternate" type="application/rss+xml" title="Thoughts From Eric (only personal posts)" href="/eric/thoughts/category/personal/rss2/full" />
<link rel="alternate" type="application/rss+xml" title="Distractions" href="/eric/thoughts/recent-links/rss2" />
<link rel="alternate" type="application/rss+xml" title="Excuse of the Day" href="/feeds/excuse/rss20.xml" />
</head>
<body id="www-meyerweb-com" class="hpg">

<div id="sitemast"><h1><a href="/"><span>meyerweb</span>.com</a></h1></div><div id="search"><h4>Exploration</h4><!-- SiteSearch Google --><form method="get" action="http://www.google.com/custom" target="_top"><div><input type="hidden" name="domains" value="meyerweb.com"></input><label for="sbb" style="display: none">Submit search form</label><input type="submit" name="sa" value="Google Search" id="sbb"></input><label for="sbi" style="display: none">Enter your search terms</label><input type="text" name="q" size="31" maxlength="255" value="" id="sbi"></input><p><input type="radio" name="sitesearch" value="meyerweb.com" checked id="ss1"></input><label for="ss1" title="Search meyerweb.com">meyerweb.com</label><input type="radio" name="sitesearch" value="" id="ss0"></input><label for="ss0" title="Search the Web">Web</label></p><input type="hidden" name="client" value="pub-3772084027748653"></input><input type="hidden" name="forid" value="1"></input><input type="hidden" name="ie" value="ISO-8859-1"></input><input type="hidden" name="oe" value="ISO-8859-1"></input><input type="hidden" name="safe" value="active"></input><input type="hidden" name="cof" value="GALT:#008000;GL:1;DIV:#336699;VLC:663399;AH:center;BGC:FFFFFF;LBGC:336699;ALC:0000FF;LC:0000FF;T:000000;GFNT:0000FF;GIMP:0000FF;FORID:1"></input><input type="hidden" name="hl" value="en"></input></div></form><!-- SiteSearch Google --><!-- <form method="get" action="http://www.google.com/custom"><div><input type="submit" name="sa" value="Search"><input type="text" name="q" size="20" maxlength="255" value=""><input type="hidden" name="sitesearch" value="meyerweb.com"></div></form><small><a href="http://www.google.com/search">Powered by Google</a></small> --></div><div id="main"><div class="skipper">Skip to: <a href="#extra">site navigation/presentation</a></div><div class="skipper">Skip to: <a href="#thoughts">Thoughts From Eric</a></div>
<div id="thoughts">


<div class="entry">
<h3><a href="http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/" rel="bookmark" title="Permanent Link: Correcting Corrupted Characters">Correcting Corrupted Characters</a></h3>
<ul class="meta">
<li class="date">Thu 19 Nov 2009</li>
<li class="time">0912</li>
<li class="cat"><a href="http://meyerweb.com/eric/thoughts/category/tech/web/" title="View all posts in Web" rel="category tag">Web</a><br> <a href="http://meyerweb.com/eric/thoughts/category/tech/wordpress/" title="View all posts in WordPress" rel="category tag">WordPress</a></li>
<li class="cmt"><a href="http://meyerweb.com/eric/thoughts/2009/11/19/correcting-corrupted-characters/#comments">71 responses</a></li>
<li></li><li></li></ul>

<div class="text">
<p>
At some point, for some reason I cannot quite fathom, a WordPress or PHP or mySQL or some other upgrade took all of my WordPress database&#8217;s UTF-8 and translated it to (I believe) ISO-8859-1 and then dumped the result back right back into the database.  So <a href="http://meyerweb.com/eric/thoughts/2009/01/22/using-http-headers-to-serve-styles/#comment-438043">&#8220;Emil Björklund&#8221; became &#8220;Emil Bj&Atilde;&para;rklund&#8221;</a>.  <del datetime="2009-11-19T22:00:09+00:00">(If those looked the same to you, then I see &#8220;B&Atilde;&para;rklund&#8221; for the second one, and you should tell me which browser and OS you&#8217;re using in the comments</del>.)  This happened all throughout the WordPress database, including to commonly-used characters like &#8217;smart&#8217; quotes, both single and double; em and en dashes; ellipses; and so on.  It also apparently happened in all the DB fields, so not only were posts and comments affected, but commenters&#8217; names as well (for example).
</p>
<p>
And I&#8217;m pretty sure this isn&#8217;t just a case of the correct characters lurking in the DB and being downsampled on their way to me, as I have WordPress configured to use UTF-8, the site&#8217;s <code>head</code> contains a <code>meta</code> that declares UTF-8, and a peek at the HTTP response headers shows that I&#8217;m serving UTF-8.  Of course, I&#8217;m not really expert at this, so it&#8217;s possible that I&#8217;ve misunderstood or misinterpreted, well, just about anything.  To be honest, I find it deeply objectionable that this kind of stuff is still a problem here on the eve of 2010, and in general, enduring the effluvia of erroneous encoding makes my temples throb in a distinctly unhealthy fashion.
</p>
<p>
Anyway.  Moving on.
</p>
<p>
I found <a href="http://wordpress.org/extend/plugins/search-and-replace/">a search-and-replace plugin</a>&#8212;ironically enough, one written by a person whose name contains a character that would currently be corrupted in my database&#8212;that lets me fix the errors I know about, one at a time.  But it&#8217;s a sure bet there are going to be tons of these things littered all over the place and I&#8217;m not likely to find them all, let alone be able to fix them all by hand, one find-and-replace at a time.
</p>
<p>
What I need is a WordPress plugin or something that will find the erroneous character strings in various fields and turn them back into good old UTF-8.  Failing that, I need a good table that shows the ISO-8859-1 equivalents of as many UTF-8 characters as possible, or else a way to generate that table for myself.  With that table in hand, I at least have a chance of writing a plugin to go through and undo the mess.  I might even have it monitor the DB to see if it happens again, and give me a big &#8220;Clean up!&#8221; button if it does.
</p>
<p>
So: anyone got some pointers they could share, information that might help, even code that might make the whole thing go away?
</p>
</div>

</div>

</div>
<p style="font-size: 90%; text-align: right; margin-top: 0.5em; padding-top: 0;">(If you care, there's even an <a href="/eric/thoughts/page/2/">archive of previous thoughts</a>...)</p>

</div><div id="extra"><div class="panel" id="archipelago"><h4>Identity Archipelago</h4><ul><li><a href="http://flickr.com/photos/meyerweb/" rel="me">Flickr</a></li><li><a href="http://twitter.com/meyerweb/" rel="me">Twitter</a></li><li><a href="http://dopplr.com/traveller/meyerweb">Dopplr</a></li><li><a href="http://www.linkedin.com/in/meyerweb" rel="me">LinkedIn</a></li><li><a href="http://technorati.com/profile/emeyer" rel="me">Technorati</a></li></ul></div><div class="panel" id="pointers"><h4>Projects Elsewhere</h4><ul><li><a href="http://aneventapart.com/">An Event Apart</a></li><li><a href="http://complexspiral.com/">Complex Spiral Consulting</a></li><li><a href="http://www.webassist.com/go/css/emeyer/">CSS Sculptor</a></li><li><a href="http://css-discuss.org/">css-discuss</a></li><li><a href="http://microformats.org/">Microformats</a></li><li><a href="http://s5project.org/">S5</a></li></ul></div><div class="panel" id="tour"><ul><li><a href="http://fray.com/issue3/"><img src="http://fray.com/images/i3c.gif" alt="Fray Contributor (Issue 3: Sex &amp; Death)" /></a></li><!-- <li><a href="http://www.webassist.com/go/css/emeyer/"><img src="/pix/CS_ad_180x109.jpg" alt="CSS Sculptor for Dreamweaver" style="max-width: 100%;" /></a></li> --></ul></div><div class="panel">
<h4>Recently Tweeted</h4>
<p class="more"><a href="http://twitter.com/meyerweb">see more</a></p>
<p>Saw a temporary license plate with expiration date MAR3010 and thought of <a href="http://twitter.com/t">@t</a>. <small>&#8211;tweeted 5 hours, 11 minutes ago</small></p>
</div><div id="sideblog" class="panel">
<h4>Distractions</h4>
<p class="more">
<a href="/eric/thoughts/recent-links/">archive</a>
</p>
<ul>
<li><a href="http://tweetagewasteland.com/2010/03/my-head-is-in-the-cloud/" title="March 18 | &#8220;I sense that my addiction to the realtime stream is only making room for the consumption of a faster stream.&#8221;">My Head is in the Cloud</a> <small>[via <a href="http://daringfireball.net/">John</a>]</small></li>
<li><a href="http://8bitnyc.com/" title="March 17 | All of a sudden I want to establish a mission in Central Park and negotiate with the natives for gold and food.">8-Bit NYC</a></li>
<li><a href="http://www.youtube.com/watch?v=nFicqklGuB0&amp;feature=player_embedded" title="March 12 | Wry comment expressing my appreciation of the creative derivativeness of this video and its uncanny accuracy in mocking common tropes.">Academy Award Winning Movie Trailer</a></li>
<li><a href="http://www.youtube.com/watch?v=414TmP12WAU" title="March 9 | &#8220;Apple juice&#8230; for half price!&#8221;  More like twice PRICELESS.  (Note: If you&#8217;re at work, don your headphones.)">Happy in Paraguay</a> <small>[via <a href="http://unstoppablerobotninja.com/">Ethan</a>]</small></li>
<li><a href="http://www.youtube.com/watch?v=9V5ubAOeOBk&amp;feature=player_embedded" title="February 10 | This is approximately the best thing ever.">U900 -Walk Don&#8217;t Run (Isogabamaware)</a></li>
<li><a href="http://www.456bereastreet.com/archive/201002/sifr_default_css_hides_content_from_at_least_one_screen_reader/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A 456bereastreet %28456 Berea Street%29" title="February 8 | -9999px comes through again, but I really wish we were beyond that kind of thing.">sIFR default CSS hides content from at least one screen reader</a></li>
<li><a href="http://www.macosxhints.com/article.php?story=20100117064356428" title="February 8 | Storing this for future use.">Take a picture with the iSight camera when a folder is opened</a></li>
<li><a href="http://mingle2.com/blog/view/web-developer-mind" title="February 4 | Mostly valid.  (SEE WHAT I DID THERE?)">The Mind of a Web Developer: An Illustrated Diagram</a></li>
<li><a href="http://www.theonion.com/content/news/science_channel_refuses_to_dumb" title="January 28 | &#8220;Punkin Chunkin, for Christ&#8217;s sake&#8230; What more do you people want?&#8221;">Science Channel Refuses To Dumb Down Science Any Further</a></li>
<li><a href="http://www.mailchimp.com/blog/project-omnivore-declassified/" title="January 27 | Sounds like quite a feat.  But I wonder how we&#8217;d feel if Microsoft or Google announced the same kind of thing on their e-mail services.">MailChimp&#8217;s Project Omnivore: Declassified</a></li>
<li><a href="http://www.politifact.com/truth-o-meter/statements/2010/jan/25/carolyn-maloney/congresswoman-says-democratic-presidents-create-mo/" title="January 26 | &#8220;Obviously, luck matters a lot, but when there is a consistent pattern over more than 60 years, it starts to look like more than just luck.&#8221;">Congresswoman says Democratic presidents create more private-sector jobs</a></li>
<li><a href="http://www.ted.com/talks/taylor_mali_what_teachers_make.html" title="January 25 | Truth.">Taylor Mali: What teachers make</a></li>
<li><a href="http://notebook.johnmartz.com/how-websites-work?c=1" title="January 22 | At last, the truth is out and I can stop pretending:  beatific monkeys are what makes it all go.">How websites work</a></li>
</ul>
</div>
<div class="panel" id="advisory">
<div class="guarded">
<a href="http://blogadvisorysystem.com/"><img src="/pix/bas/guarded.png" alt="Blog Advisory System Alert Level: Guarded"></a>
</div>
</div>

<div class="panel" id="excuse">
<h4>The <a href="/feeds/excuse/">excuse of the day</a> is</h4>
<p>Internet 1 traffic is being routed onto Internet 2</p>
</div>

<div class="panel" id="extras">
<h4>Extras</h4>
<ul>
<li><a href="/feeds/">Feeds</a> &#8226;</li>
<li><a href="/eric/faq.html">FAQ</a> &#8226;</li>
<li><a href="/family.html">Family</a></li>
</ul>
</div>

</div>

<div id="navigate">
<h4>Navigation</h4>
<ul id="navlinks">
<li id="archLink"><a href="/eric/thoughts/">Archives</a></li>
<li id="cssLink"><a href="/eric/css/">CSS</a></li>
<li id="toolsLink"><a href="/eric/tools/">Toolbox</a></li>
<li id="writeLink"><a href="/eric/writing.html">Writing</a></li>
<li id="speakLink"><a href="/eric/talks/">Speaking</a></li>
<li id="otherLink"><a href="/other/">Leftovers</a></li>
<li id="aboutsite"><a href="/ui/about.html">About this site</a></li>
</ul>
</div>

<div id="footer">
<p class="sosumi">All contents of this site, unless otherwise noted, are &copy;1995-2008 <strong>Eric A. and Kathryn S. Meyer</strong>.  All Rights Reserved.</p>
<p>"<a href="/eric/thoughts/">Thoughts From Eric</a>" is powered by the &uuml;bercool <a href="http://wordpress.org/">WordPress</a></p>
</div>
</body>
</html>
