[Coding] Extracting UTF-8 Encoded Text from SQLite

Posted by Khatharsis on August 27, 2013

I’ve been in a little bit of a coding slump and rather than push myself through it, I gave myself a break and did some other things like playing copious amounts of Fire Emblem. I came back this week with the intent to build another Thai Suite app (vocab “flashcards”), but ran into a roadblock when extracting a SQLite table to CSV so I could import it into Excel and run my handy script to get the JavaScript code I need. I’m using SQLite Database Browser (the PortableApps version) and while it exports to a .sql file, it had troubles exporting the Thai words to .csv. I even tried setting the encoding in the resulting text file to UTF-8, but that was no avail. So, I coded up a small program to extract that column and learned that the console doesn’t support characters other than what the code page supports (setting Console.Encoding to UTF-8 resulted in some interesting gibberish). Long story short, writing to a text file, which was my goal anyway, solved the problem. Getting there is the more interesting part.

I had a hard time finding already-written code for this problem and it’s possible I had it much earlier if I had thought to run the debugger and set a stop point before calling Console.Out(), but I had a lot of fun figuring out this problem for myself. Makes me feel a little better after my coding slump and I learned a few tidbits out of it as well, which I will share with you.

Since I have a SQLite database, I needed to connect to it via .NET. There is a rather handy library that can make this connection for you called System.Data.SQLite. The documentation on how to go about using it is rather poorly written and searching for a tutorial or two on how to get started goes a long way. The first step is downloading the appropriate version you want. I had grabbed the x64 version figuring I’m running a 64-bit OS and that then required setting the target platform for my C# Project to x64 as well. I learned that I also didn’t have to install it, but I downloaded a .exe that ran a couple of scripts? I did mention it was all quite confusing, yes? Well, I eventually figured out that I had to add a reference to the SQLite DLL and copy one of the files to my build folder for it to properly run (see the above link).

The next step was then figuring out what all the syntax was to get the data I wanted out of my database. In that particular link, I studied the helper class (SQLiteDatabase.cs) code and used what was relevant for my purposes. Except, I was getting question marks in my console output. I should, for better practice, go back and see if it actually extracts the Thai words properly, but then I wouldn’t have had a chance to learn about the hex method in SQLite and using magic to essentially transform it into the string I wanted.

Okay, so maybe it’s not “magic,” with smoke and mirrors, but the steps are fairly simple. Using a SQL query, nab the desired data as a hex string. Back in C#, convert the hex string to a byte array and then convert it into a string using UTF-8 encoding. Quite simple. I feel a little wasteful needing to write such little code just to extract Thai letters from the SQLite database for easy transfer into an Excel spreadsheet, but it might come in handy later on down the line. The code is below if you are interested (note: the StringToByteArray() method is one you can find on StackOverflow; it’s up to you what you want to use or write your own).

static void Main(string[] args)
{
    string outFile = @"path\to\tmp.txt";
    string db = @"path\to\vocab.db";
    SQLiteConnection conn = new SQLiteConnection(String.Format("Data Source={0}", db));
    try {
        conn.Open();
        SQLiteCommand cmd = new SQLiteCommand(conn);
        cmd.CommandText = "SELECT hex(thai) AS thai FROM vocab_masterlist ORDER BY id";
        SQLiteDataReader reader = cmd.ExecuteReader();

        using (StreamWriter sw = new StreamWriter(outFile))
        {
            while (reader.Read())
            {
                string hexStr = reader["thai"].ToString();
                byte[] bytes = StringToByteArray(hexStr);
                string word = System.Text.Encoding.UTF8.GetString(bytes);
                // Writing to Console doesn't work! Console.Out.WriteLine(word);
                sw.WriteLine(word);
            }
        }

    }
    catch (SQLiteException e) { }
    finally {
        conn.Close();
    }
}