Textmate and high level ASCII characters

April 14, 2008

Here’s a fun gotcha when creating commands for TextMate: if your command’s output is set to Create New Document on output, then that output must be either free of high level (extended) ASCII characters, or UTF-8.

If your output doesn’t conform to this rule, the new document populated by TextMate will be blank. Nada, nothing, just blank.

Try this test:

Create a file in TextMate with an extended character in it (umlauts are good, or say option-y). And something afterwards, to prove my point that nothing else comes through.
Save, and set the file encoding to UTF8
Create a new command: set its Input to None, Output to Create New Document, and Command(s) to cat path/to/your/saved/file
Run command
Notice your output: just like you expected
Now Save your document as another encoding. Say MacRoman. (Make sure to replace your old file: we want the command to open this file now!)
Run the command
Notice there’s no output

This might not seem like a big deal, but remember that any Unix command you run could return high ASCII text. Grepping through a source tree of MacRoman files, let’s say.

In cases where those extended characters can be discarded, there’s a great tool from University of California at San Diego: a Perl script called fix. Fix replaces extended ASCII characters (and other craziness) to spaces. (I like to put scripts like this in a bin folder in my home directory.)

So, if extended ASCII characters don’t matter to you, use fix like this: cat path/to/your/saved/file | perl ~/bin/fix.pl -

If you really do care about those characters, and you can assume the source encoding, you can use iconv. cat path/to/your/saved/file | iconv -f 'macroman' -t 'utf-8'

After all that, it should be noted that the Output: Show as HTML setting can accept extended ASCII characters. But if your output isn’t HTML…