Thursday, July 15, 2010

How to upload UTF8 fax list into MYSQL

A client occasionally needs to upload large fax list (over 100,000 numbers) to our web-based fax broadcast platform.  The utf8 encoded list consists of China fax numbers, company and contact name among other information.
He soon encounters sesson timeout when uploading such a big file.
To solve this problem, we prepare some scripts to upload the list on behalf of him.
As the fax list will be eventually stored in MYSQL, we use the command line utility 'mysqlimport' to directly import the list on server side.
mysqlimport \
--user=<login_id> \
--password=<password> \
--default-character-set=utf8 \
--local \
--force \
--columns=col1,col2,col3,col4 \
--fields-optionally-enclosed-by='"' \
--fields-terminated-by='\t' \
--lines-terminated-by='\r\n' \
--ignore-lines=1 \
--delete \
--verbose \
--debug \
db_name source_fax_list.txt
Note that mysqlimport will take source_fax_list as the table name to upload.  The parameter 'local' is required, otherwise, mysql will assume your fax list is in mysql's installation directory.  The 'ignore-lines=1' will skip the header line in fax list.  The 'delete' parameter instructs mysql to truncate table before upload.  The 'default-character-set' parameter is also required to import utf8-encoded list.
It just takes a few seconds to upload over 100,000 numbers to our MYSQL.  Nevertheless, one drawback of mysqlimport is that we could not easily see the warnings or errors happened in upload.  A 'show warnings' would not display the required information.  An alternative is to use 'Load data local infile' in mysql session and issue 'show warnings' afterwards to inspect the upload warnings or errors.
Sometimes, we need to verify the contents of our list by querying the table, but only to see some funny characters.  It might be due to missing translation settings in our telnet client and 'charset' when querying mysql.
For example, we use PuTTY to telnet to our MYSQL server.
The PuTTY translation (Configuration|Window|Translation) should be set to UTF-8. 
When logging in mysql, we also need the parameter  '--default-character-set=utf8'. 
mysql -u <userid>  --password=<password> --default-character-set=utf8
Both are required to see the UTF-8 Chinese characters

No comments:

Post a Comment