Fixing the dirify function in Movable Type
The dirify function Movable Type uses to turn post titles into legal directory names suitable for URLs has serious problems with some accented characters. I could not find a satisfactory fix anywhere, so I hacked up my own. While I was at it, I changed it to use hyphens instead of underscores to separate words in URLs.
I’ve seen the accented character bug before, but haven’t bothered to look for a solution since nearly all posts on 456 Berea Street are in English. However when I was working on another site recently I really needed to fix this to make Swedish post titles turn into reasonably readable URL fragments.
All I wanted dirify to do was convert any accented characters to their non-accented versions:
å => a
ä => a
ö => o
Å => a
Ä => a
Ö => o
Seems simple enough, but that was not what happened. Instead, Movable Type for some reason converts the characters this way:
ä => ae
ö => oe
Ä => ae
Ö => oe
That’s right. The letters “å” and “Å” are simply removed. Hey! They may look odd to the rest of the world but we use them a lot here! Replacing “ä” with “ae” and “ö” with “oe” is not what I want either.
Since I also wanted to use hyphens instead of underscores to separate words in URLs I started looking for plugins that could help out with that, hoping that I might also stumple upon a solution to this character conversion problem. I found a few options: Dashify, Dirifyplus, and Dirify for Unicode. Neither fixed the problem. Dirifyplus (I think) actually made it even worse by converting all accented characters to “a”.
So in the end I decided to find Movable Type’s dirify function and fix it. After a bit of searching I found it in
/lib/MT/Util.pm. The separator character is defined on line 544, and the conversion table starts on line 620 (assuming Movable Type 3.2 set up to use UTF-8).
To use dashes instead of underscores, just edit line 544. To make dirify convert Swedish accented characters to something more usable, replace the
my %utf8_table hash table with this patched utf8_table.
Happy that I had managed to solve the problem, I went to apply the fix to my other site, Kaffesnobben. Well, that revealed other problems since that site is running Movable Type 3.17. First, the separator character is specified on line 457 instead of 544, and the
my %utf8_table hash table starts on line 528 instead of 620. No big deal, it should work anyway, right? Wrong. After applying the patches the Swedish characters were converted properly, but underscores were still being used as separators.
After spending way too long trying to figure out why dirify wouldn’t use hyphens instead of underscores I finally found the answer: I had installed
patch-20050616-utf8dirify-nodash.pl, a plugin that fixes another dirify problem. Well, that plugin also overrides the separator character specified in
Util.pm. So if you have this plugin installed, make sure to edit the separator character on line 20.
It took a while, but in the end I found a solution to my problem. Hopefully this post will save someone else a bit of frustration.