Validating comments (and fixing comment preview)

Long-time readers, particularly those of you who post comments now and then, are probably aware of some ongoing problems I’ve been having with my commenting system. Commenters have had to use Markdown instead of HTML, code snippets were removed, escaped entities were unescaped by the comment preview and then removed when the comment was posted.

Every now and then I’ve had a go at fixing the problems, but each time I got stuck. The situation was complicated by the fact that this site is using an old version of Movable Type for the back-end. Movable Type is written in Perl, which I am not too comfortable with. That obviously made it even more difficult for me to track down the problems I was having. It was very frustrating.

Yes, I considered upgrading Movable Type. But I rely on many different plugins and have patched the MT code quite a bit, so I am very nervous about upgrading. I expect something will break, and I don’t need the stress that would cause me. I looked at other platforms, but none of them looks good enough to motivate a switch. So I’m sticking with Movable Type since it works well enough for me. Except for the problems with comments.

However, most (all?) of those issues should now be fixed, in no small part thanks to the help I got from Jacques Distler when i had problems getting his MTValidate plugin up and running. Thanks, Jaqcues. I won’t forget those beers I promised you :-).

So what do the changes mean for anyone posting comments here? A number of things:

  1. Only HTML 4.01 Strict is accepted. If your comment contains invalid HTML, a list of validation errors will be displayed, and you will be asked to correct the errors. Your comment cannot be posted until you have done that. No more blockquote elements with inline content, which I complained about in Use only block-level elements in blockquotes ;-).
  2. Markdown is still the preferred method for marking up comments, but the HTML elements and attributes created by Markdown can be entered manually if you prefer doing that.
  3. Markdown should work properly now, so you can post code examples and use backticks (`) to create code elements.
  4. Escaped entities are no longer unescaped by the comment preview, so you don’t have to retype all those < and > entities that took you so long to enter. This has been a bug in my comment preview for a couple of years or so. It turned out to be an embarrassingly simple fix, again thanks to Jacques.
  5. There should be less spam. Not that I think most readers will have noticed much of it since I am pretty quick at removing spam (and I am considering more and more comments that look legitimate at first glance to be spam), but I do get hit by spam floods every now and then. Spam robots should find it harder to post comments now that i am Forcing Comment Previews.

In case you’ve accidentally entered invalid markup in a comment recently, now you know what’s up with the error message you got.

Update: Well, a bulletproof way of making any bugs come out in the open is to post an article claiming that those bugs are gone.

Shortly after posting this I found a problem related to the forced comment preview in combination with Markdown. When Markdown outputs HTML it somehow, and only sometimes, makes the commenting script think the comment has been edited between previewing and posting.

I’ll try to find out what’s causing it but it will have to wait. Apologies for any confusion.

Posted on June 18, 2007 in Movable Type

Comments

  1. Re: #4, I’m curious what the fix was. Thanks!

  2. June 18, 2007 by Roger Johansson (Author comment)

    Mike: All I had to do was add encode_html="1" to the MTCommentPreviewBody tag:

    <$MTCommentPreviewBody encode_html="1"$>

    Like I said, embarrassingly simple.

  3. hmm.. why not to use a token to check whether a human is sending a comment or not.. and another thing that’s interesting for me is, why did you decide to use Movable Type if you are not comfortable with PERL?

  4. June 18, 2007 by Roger Johansson (Author comment)

    @Rafal: What do you mean by using a token?

    When I started blogging Movable Type was the only workable option I could find, and I wasn’t expecting to be hacking around with Perl (which you really don’t have to unless you want to tweak things a lot).

  5. You can store a token in a session on the form page and in a hidden variable. When the form processes the token must match the token stored in the session (this prevents bot/telnet requests directly to your processing page). It isn’t fool proof - but it does prevent quite a bit. You can see some info related to PHP at: Foiling Cross Site Attacks

    I am also interested as to why you chose a platform you weren’t comfortable. I see your posts related to Javascript and frameworks where you really push for people to understand what is going on at the core (not just using a framework blindly). Would this not be the same for your own website? Wouldn’t it be beneficial to know what is going on at the core? Being scared to update sounds like it’s been hacked or patched to do what you want with quick solutions.

    I am not bashing - just wondering why you made that decision for your backend.

  6. June 18, 2007 by Roger Johansson (Author comment)

    Nate:

    You can store a token in a session on the form page and in a hidden variable.

    Ah, I have something similar to that going on, yes. But it isn’t enough.

    Regarding my choice of platform: again, it really was the best option at the time (2003) and I was not expecting to be hacking it.

    That said, I’m not so sure I would choose differently if I started over today.

    I do HTML + CSS + JavaScript, I’m not a back-end programmer. I really do want to understand exactly what is going on though - that’s why I get so frustrated when I don’t.

  7. I just love it when you need a manual to comment… ;-)

    But seriously, good luck. It’s hard to get it to work smoothly while avoiding spam etc.

  8. Too bad you can’t ftp beer. I guess I’ll have to wait till we meet in person.

    Glad everything worked out.

    With MT4 coming out, now would not be the time to go through whatever pain was associated with upgrading to MT3.

    You can store a token in a session on the form page and in a hidden variable. When the form processes the token must match the token stored in the session (this prevents bot/telnet requests directly to your processing page)

    What he’s doing is like that … except better.

    Wouldn’t it be beneficial to know what is going on at the core?

    MovableType (v3; I don’t have a copy of versions 2 handy) is over 130,000 lines of code. Despite having done I great deal of MT hacking, I could not seriously pretend to “know what is going on at the core.” And I’d be hard-pressed to imagine how anyone who hadn’t similarly spent a large amount of time hacking the software could do any better.

  9. June 19, 2007 by Roger Johansson (Author comment)

    Nate: One more thing about the token - it would only work if the user allows cookies. Most people do, but some don’t. I guess not being able to post a comment may not be an extremely serious problem, but I don’t like the idea.

    Robert: Yeah I know it sucks. Just write valid HTML and you should be fine ;-).

    Jacques: Sending beer over ftp would be cool :-).

    With MT4 coming out, now would not be the time to go through whatever pain was associated with upgrading to MT3.

    Nope. I’m waiting for MT 4 becoming stable (and maybe even for 4.01) before upgrading… if I really need to upgrade.

  10. P.S.: Just a UI thing, but you might consider either

    1. Moving the “Post” button up to the preview display (as I’ve done on my blog). or
    2. Using an onchange handler to disable it, when the user edits the comment text.

    It’s a little disconcerting, for those who aren’t expecting it, to have both the “Preview” and “Post” buttons side by side, but only be allowed to do one of those two actions.

    There should be a way to let the user know when he’s allowed to press the “Post” button.

  11. June 19, 2007 by Roger Johansson (Author comment)

    Jacques: I’ll look into making the comment preview a bit friendlier once I’ve figured out why Markdown is interfering with the forced comment preview when you post code. I am guessing that encode_html is part of the problem.

    I thought everything was working fine, but apparently not.

  12. The problem is with the rather lame HTML sanitization filtering that MT does (in an apparently Markdown-incompatible way).

    Markdown escapes anything in backticks. Thus

    `&lt; &amp;`

    becomes

    &lt; &amp;

    whereas

    `<p> &</p>`

    becomes

    <p> &</p>

    All well and good, so long as the code in backticks is actually allowed by your sanitization filter. If you typed

    `<foo> &</foo>`

    instead, this would get screwed up on the way to being posted.

    Since I never did get around to fixing this, I tend to punt, and type

    <code>;&lt;foo&gt; &amp;&lt;/foo&gt;</code>

    which is less convenient, but more reliable.

  13. Why exactly don’t you allow (only) HTML for comments, Roger, especially on a site that targets web professionals? Considering that every syntax is confusing for beginners, but HTML (likely) being familiar to our readers, HTML seems to be the best choice.

    (In my blog, I only received one mail concerning confusion due to HTML in comments, by a real “outsider”; I also always need to take a look at the Markdown or Textile “specs” when I need to use it for comments … it’s just strange stuff.)

    Concerning MT upgrades: Are you using any versioning system (like CVS or Subversion) for this site? This should make it far more easier (apart from the general benefits …). With my modified WordPress blog, I also fear some problems to occur once I want to update the system, but database dumps and a comparison/review of all changes (via SVN diff) should do the trick, even though it’s probably quite time-consuming.

  14. June 19, 2007 by Roger Johansson (Author comment)

    Jacques:

    The problem is with the rather lame HTML sanitization filtering that MT does (in an apparently Markdown-incompatible way).

    Yeah, something does indeed seem to go wrong there. Is it fixable or at least workaroundable do you think?

    Jens:

    Why exactly don’t you allow (only) HTML for comments, Roger, especially on a site that targets web professionals?

    Because way back in time, Markdown was the only option I could find and make work that ensured reasonably valid and well-formed input. I used to serve this site as XHTML 1.0 with an application/xhtml+xml mime type to browsers that can handle it, so I had to make sure comments were well-formed.

    I don’t use XHTML anymore, and with MTValidate now working I could allow HTML (or a subset of it), but I can’t disable Markdown without first editing all previous comments that use Markdown… and I don’t particularly feel like editing thousands of comments :-P.

    Are you using any versioning system (like CVS or Subversion) for this site?

    No, though I do keep manual backups.

    Upgrading MT may not be as problematic as I fear, but I’m not going there just yet, as I don’t see how upgrading would solve any of my problems.

  15. Yeah, something does indeed seem to go wrong there. Is it fixable or at least workaroundable do you think?

    The fix turns out to be surprisingly easy. You need to add a sanitize='0' attribute to prevent MT from munging the comment before hashing it:

    <input type="hidden" name="validated" value="<MTSHA1SaltHash><MTCommentPreviewBody convert_breaks='0' sanitize='0'>...</MTSHA1SaltHash>" />

    Thanks for emphasizing the problem. For too long, I was content to “work around” the problem by simply eschewing Markdown backticks.

  16. June 19, 2007 by Roger Johansson (Author comment)

    The fix turns out to be surprisingly easy. You need to add a sanitize=’0’ attribute to prevent MT from munging the comment before hashing it:

    Yep, that seems to have done the trick. Thanks! One more beer for you :-).

  17. What I’m more interested in is: does your MT setup still block my comments if I try to put my URL in?… Let’s see if this one appears! :-)

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.