Python, MySQLdb, and UTF-8

December 31st, 2006

I just spent a lot of time trying to parse some UTF-8 xml and put parts of it into a mysql database with python.The parsing in UTF-8 I already had done, but getting MySQLdb to use UTF-8 instead of latin1 was very annoying. I did this with MySQL-python-1.2.1_p2 and MySQL 4.1.21.

The first thing to do is to make sure you know the basics of unicode.

The first thing second thing to do is to make sure that all the of the settings in MySQL are set to UTF-8. I did this with phpMyAdmin and setting the connection settings upon initiating connection with MySQLdb. Do not be alarmed if connection.character_set_name() still returns latin1.

The last thing I did was follow these instructions as to how to create a proper utf8 query. Basically do this:

connection.execute('INSERT INTO table VALUES (id, %s)', (value.encode('utf-8'),))

Instead of this:

connection.execute('INSERT INTO table VALUES (id, '+value.encode('utf8')+')')

There are probably parts of this that were unnecessary, but I am just happy it works.


One comment to “Python, MySQLdb, and UTF-8”


  1. stuffduff said:

    I like to build MySQL from scratch and –with-extra-charsets=complex has always been a part of my configure options. The reason that the top one works is that it is passed as two separate args, while the second one is just passing a single string. I’d say that what you did was most likely necessary. Good job!

Leave a Reply