Textmate and high level ASCII characters
April 14, 2008
Here’s a fun gotcha when creating commands for TextMate: if your command’s output is set to Create New Document on output, then that output must be either free of high level (extended) ASCII characters, or UTF-8.
If your output doesn’t conform to this rule, the new document populated by TextMate will be blank. Nada, nothing, just blank.
Try this test:
- Create a file in TextMate with an extended character in it (umlauts are good, or say option-y). And something afterwards, to prove my point that nothing else comes through.
- Save, and set the file encoding to UTF8
- Create a new command: set its Input to None, Output to Create New Document, and Command(s) to
cat path/to/your/saved/file
- Run command
- Notice your output: just like you expected
- Now Save your document as another encoding. Say MacRoman. (Make sure to replace your old file: we want the command to open this file now!)
- Run the command
- Notice there’s no output
This might not seem like a big deal, but remember that any Unix command you run could return high ASCII text. Grepping through a source tree of MacRoman files, let’s say.
In cases where those extended characters can be discarded, there’s a great tool from University of California at San Diego: a Perl script called fix. Fix replaces extended ASCII characters (and other craziness) to spaces. (I like to put scripts like this in a bin
folder in my home directory.)
So, if extended ASCII characters don’t matter to you, use fix like this: cat path/to/your/saved/file | perl ~/bin/fix.pl -
If you really do care about those characters, and you can assume the source encoding, you can use iconv. cat path/to/your/saved/file | iconv -f 'macroman' -t 'utf-8'
After all that, it should be noted that the Output: Show as HTML setting can accept extended ASCII characters. But if your output isn’t HTML…