Use only block-level elements in blockquotes

Something I see quite a bit of around the Web is the blockquote element being slightly abused. No, I’m not talking about using it just to indent text (which I actually don’t see a lot of, despite some people claiming that to be a widespread issue), but about when blockquote elements have inline content such as text.

What, no text? That’s right, the blockquote element is not allowed to have text or inline elements as direct descendants. Only block-level (and in HTML 4.01 Strict, script) elements are allowed unless you use a Transitional Doctype, in which case both block-level and inline elements are allowed. But there are plenty of sites that use a Strict Doctype and still have blockquote elements that contain inline elements.

So to make sure your blockquotes are valid regardless of the Doctype being Strict or Transitional, always use block-level content inside blockquote elements:

  1. <blockquote>
  2. <p>This is a quoted paragraph of text, wrapped inside a <code>p</code> element, which is valid in both Strict and Transitional Doctypes.</p>
  3. </blockquote>

Since I use a strict Doctype on this site (HTML 4.01 Strict), I sometimes get invalid code when visitors use blockquote elements in comments. Most people get it right since they use Markdown syntax, which automatically creates correct blockquotes. However, some enter the text they want to quote directly inside blockquote elements, like this:

  1. <blockquote>
  2. This is a quoted paragraph of text, directly inside a <code>blockquote</code> element, which makes this quote valid only in Transitional Doctypes.
  3. </blockquote>

And unfortunately that slips past Markdown and leads to the page becoming invalid. Actually I am a little surprised to see some people use the blockquote element this way. Are you still using transitional Doctypes or has it been a while since you used the HTML validator? ;-)

So, to help ensure validity, please use block-level elements inside blockquotes whenever you post a comment on a blog or a forum that allows HTML.

Posted on May 16, 2007 in (X)HTML, Web Standards

Comments

  1. I did not know that! Thanks for pointing it out! I don’t tend to have very many quotes in the sites I do so that’s probably why. The few pages that include quotes don’t happen to be on the pages that get validated after the original templates are built. Or they will be in a blog post or something that is put up long after the original design was done (and when most of the validation happens).

  2. That’s right, the blockquote element is not allowed to contain text.

    Now that isn’t entirely accurate, is it? It would be better to say that the direct descendants of a blockquote element must be block-level elements.

  3. I had to laugh when I saw this article. Just yesterday I was complaining bitterly because my blockquote was failing validation. I now understand why I had to insert my p tags, but the question remains, from a semantics perspective shouldn’t a blockquote only contain text? I really can’t think of the need to include anything but text within a blockquote when it is used for marking up a block quotation.

  4. May 16, 2007 by Roger Johansson (Author comment)

    Simon: Yes, that’s a correct observation. I was trying to be funny, but I’ll update it to prevent any misunderstanding.

    Mark: Well, the element is called block-quote :-).

  5. At first, your article seemed like a complete no-brainer. Then I checked a couple of my recent articles and comments… Such an easy mistake to make.

    Great write up, and great reminder.

  6. Roger - or anyone else - can you give me a decent reason why blockquote should contain only block level elements? There seems nothing structurally or meaningfully wrong with

    <blockquote>...text...</blockquote>
    
  7. May 17, 2007 by AdamA

    A people that values its privileges above its principles soon loses both.

    Haha - I invalidated you ;)

    (Not any more ;-). /R)

  8. It’s always good to remind people of such tidbits, but as for your validation problem, remember: never trust user input (if validity is important to you).

  9. I had no idea about this either. I was wondering why I was having issues with p tags appearing half way through a blockquote in my Wordpress output! I was getting frustrated as I thought it was an error in the program - thanks to this post I know the error in fact my own. If I use the proper structure then everything would be fine.

    Thanks for the info. OJ

  10. To answer your last question, yes for many CMS-based sites I still use a transitional Doctype. I find that with the current state of most WYSIWYG HTML editors, this is the best way to keep the website validating correctly (according to the Doctype anyway), unless there’s something I’m missing. From personal experience, my clients are not quite up to using something like Markdown just yet.

    Having said that, when hand-coding sites though, yes, I’m using a strict doctype where possible.

  11. This is one of the occasions where I don’t understand the rules. What detriment does inline content as a direct descendant of a blocklevel element have? It seems arbitrary. I don’t see a drawback to doing it this way, but I don’t see the significance either. Could someone explain why beyond ‘so it validates’. I agree with validation as a way to standardize markup, but some of it just seems arbitrary.

  12. Roger - or anyone else - can you give me a decent reason why blockquote should contain only block level elements?

    Maybe because a blockquote should be associated with a citation for the person being quoted. Obviously, the quote and the citation aren’t part of the same paragraph.

  13. May 17, 2007 by James

    A block quote (or long quotation in MLA speak) is a compositional writing element first, as is its counterpart, the in-line quotation. In composition, the block quote signals a longer quotation (usually at least several lines on paper) that is set off as a ‘block’ of text as opposed to a short quotation that will not interrupt the regular paragraph formatting.

    From my experience studying and teaching composition, block quote in essays often contain text that would not be a block level element (like a paragraph) in html. Block quotes just as validly, and more often, contain a sentence or two (or even part of one long sentence) as they contain a paragraph or two. A sentence, or portion of one, that is contained in a larger paragraph should hardly be seen as a block-level element.

    I am constantly reminded to use html tags to describe what’s going on in terms of compositional elements: p-tags should surround paragraphs, not just random sentences; h-tags should only surround headings, not just larger text; and em-tags should surround emphasized text, not titles that are italicized.

    Block quotes are quotations set off in a block of text, perhaps a block-level element in and of themselves, but surely not meant to be read as ‘a quotational element meant to only contain block-level elements.’ Using it in that way doesn’t conform to the general conventions of writing composition anyway.

    John—absolutely the quotation is often part of the paragraph (even then it is often the object of a single sentence—at least that’s what my college-level grammar book tells me). That block quotes are introduced by a colon, according to MLA style, indicates that they are the object of the introducing sentence. Quotations rarely stand on their own (except as epigraphs).

  14. @ James: You may be too narrow in your definition of a paragraph. Consider Webster’s

    A brief composition complete in one typographical section

    In html, I treat the p element as the default for any bit of text that isn’t better described by another element. The blockquote says that what follows is a block of quoted material. That material has structure itself, and a paragraph is one structure.

    cheers,

    gary

  15. I’m curious as to why you allow HTML if you’re pushing the use of Markdown. I’m even more curious as to whether or not you restrict the HTML to a heavily filtered subset.

    As Sébastien Guillon mentioned and aiming this as a general note for any parsers by one has to be really careful about introducing the potential for XSS attacks by allowing HTML in comments. Also bear in mind that comment forms are particularly dangerous in this context as any comments added permanently modify the page. If an attack on a comment form is successful it’s implications will be seen by all visitors to that page, not simply the ones who are sent a specially crafted URL.

  16. May 17, 2007 by Roger Johansson (Author comment)

    Richard:

    Roger - or anyone else - can you give me a decent reason why blockquote should contain only block level elements?

    I don’t know what the reasoning behind it is. It’s the same for form elements. Maybe someone in the HTML Working Group knows.

    Michael:

    I find that with the current state of most WYSIWYG HTML editors, this is the best way to keep the website validating correctly (according to the Doctype anyway), unless there’s something I’m missing.

    It depends on the editor. Some can be configured to only allow strict markup. At my dayjob we use a custom filter to remove any presentational stuff that slips through by accident.

    Ed:

    I’m curious as to why you allow HTML if you’re pushing the use of Markdown. I’m even more curious as to whether or not you restrict the HTML to a heavily filtered subset.

    Markdown allows the HTML it recognises, so only a subset can be used.

    Ehrm. Make that could. I just realised that I don’t need to allow HTML in comments. I think I switched that option on way back and thought Markdown required it. Not so. Ha! No more invalid blockquotes here :-).

  17. Well guess I also learnt something new today. I guess with text it needs to be wrapped around by p right?

    Is using span ok?? I personally do not use it but I have seen it used…

    Thanks and I am sorry if its a bit basic :(

  18. May 17, 2007 by Roger Johansson (Author comment)

    Jermayn: Yes, p is the natural block level element to use in blockquotes. span is an inline element though, so it won’t do.

  19. I believe the reasoning behind Markdown allowing a subset of HTML is that John Gruber didn’t want to replicate every useful HTML element when the HTML syntax wasn’t particularly onerous.

    Turning off HTML entirely in the comments should prevent any validation errors, but it will also stop your commenters using HTML elements that don’t have a Markdown equivalent: and spring to mind.

  20. In my opinion it is the softwares fault, which doesn’t handle that. You can’t be serious if you say that everyone who want’s to comment here should look into your sourcecode to see if you are using a strict or transitional doctype.

    Why not let parse your software the comment and add the p if you want to satisfy the validator?

  21. Actually, this is a reason that I tend to stick with transitional doctypes.

    I can’t see any rational semantic reason why a blockquote needs to contain a p, unless it’s a quote of more than one paragraph. Indeed, as James points out, p may actually be semantically wrong in some cases.

    What would be the downside if blockquote were allowed to contain either inline or block-level content, the way that it does work under transitional doctypes?

    Doing something just because the W3C says so seems like a poor argument - especially afer you’ve just been complaining about some of the silly things they’re saying about HTML5.

  22. May 17, 2007 by Roger Johansson (Author comment)

    pauldwaite:

    Turning off HTML entirely in the comments should prevent any validation errors, but it will also stop your commenters using HTML elements that don’t have a Markdown equivalent

    Right, but I don’t really see a huge need for those elements in comments.

    Jeena:

    You can’t be serious if you say that everyone who want’s to comment here should look into your sourcecode to see if you are using a strict or transitional doctype.

    No, no, of course not. What I’m saying is that you need to make sure to use p elements (or other block level elements) in blockquotes for strict doctypes, and since that is also allowed in transitional you might as well always do it.

    Chris:

    I can’t see any rational semantic reason why a blockquote needs to contain a p, unless it’s a quote of more than one paragraph.

    Me neither.

    What would be the downside if blockquote were allowed to contain either inline or block-level content, the way that it does work under transitional doctypes?

    I don’t know. None, as far as I can tell.

    Doing something just because the W3C says so seems like a poor argument

    I’m not doing it just because the W3C says so - I’m doing it so I can use validation as a QA tool without having to manually filter out false positives.

  23. May 17, 2007 by Roger Johansson (Author comment)

    Ok, I just spent a while trying to find out exactly why blockquote elements (plus the body and form elements) must only contain block level elements in strict doctypes. The only explanation I can find is that the content model of those elements changed between HTML 3.2 and 4.

    HTML 5 has the same restrictions.

  24. I don’t allow any HTML in my comments, only Markdown.

    A problem I have with it though is that the blockquote-syntax in Markdown uses the > character.
    Since i run all comments through htmlentities() before markdown(), all the Markdown-blockquotes will be turned in to > and thus not work.

    How did you solve this, Roger? (If you do use htmlentities…) (which i gattered from “No more invalid blockquotes here”).

  25. Woops, it looked good in the preview, but it’s supposed to read: all the Markdown-blockquotes will be turned in to &gt; and thus not work.

  26. May 17, 2007 by Roger Johansson (Author comment)

    Andreas: I don’t use htmlentities(). I’m not sure it’s possible to mix PHP with Perl (I use Movable Type). It’s probably a whole lot easier when the entire back-end is PHP.

  27. May 17, 2007 by Aldrik

    Chris:

    I can’t see any rational semantic reason why a blockquote needs to contain a p, unless it’s a quote of more than one paragraph.

    Well the blockquote element is for quoting blocks of content (paragraphs, and paragraphs should always be in a p element). If you are quoting something shorter than a paragraph (like only a sentence) you should be using the q (short quote) element.

  28. Only use block-level elements in blockquotes

    As compared to “only draw block-level elements in blockquotes” or “only eat block-level elements in blockquotes”?

    I think you may have meant to use “Use only block-level elements in blockquotes”.

    :)

  29. May 17, 2007 by Roger Johansson (Author comment)

    Kim:

    I think you may have meant to use “Use only block-level elements in blockquotes”.

    Yep. I guess it’s easier on some days than others to tell that English isn’t my native language.

  30. I guess I’ve been doing it correctly. I do this:

    <blockquote cite="URL-of-Source">

    <p>This is the text I’m quoting <cite>&#8212; Who Said</cite></p>

    </blockquote>

    I add the cite attribute when applicable to link the blockquote to the online source, and the cite element I use to show the quoted person’s name. The cite element I add to the last paragraph, but I suppose it should/could go in its own paragraph. It looks the same either way with CSS (float to the right, make bold, add color, pad it, and I also add an end-quote background image).

    (Sorry, backticks didn’t seem to form code for me.)

  31. i run all comments through htmlentities() before markdown()

    Doesn’t Markdown handle everything that htmlentities does? (I’m unfamiliar with htmlentities.)

  32. I’m not sure why so many commenters seems to think that the requirement to use a block-level element means they have to wrap their quotes in a <p>. If the text you are placing in a <blockquote> is not, semantically speaking, a paragraph, and none of the other block-level elements are appropriate, then place it in a <div> which is also a block-level element but is semantically neutral.

  33. Filtering HTML out of Markdown-formatted text isn’t an easy task, as various Markdown constructs use less than and greater than sings and because code spans and code blocks expect unescaped content. Blockquotes seem to pass through your HTML filter easilly, but not automatic links (like <http://www.michelf.com/>) and any HTML snippet you may put in a Markdown code span or code block.

  34. May 18, 2007 by Su

    Part of the trouble with this, at least as far as Movable Type, is that the built-in Convert Line Breaks text filter is pretty broken on this point, and requires some backflips to produce proper blockquotes. I repeatedly forget the trick, but I think the requirement is that your input look like this:

    This is the copy before the blockquote.
    
    <blockquote>
    
    This is the first quoted paragraph
    
    This is another quoted paragraph
    
    </blockquote>
    
    This is the following text.
    

    If any of those lines touch, you end up with a line break, and some interestingly broken P tags, depending upon exactly what you did. It’s almost easier to just do all the markup manually.

    Textile produces correct single-paragraph quotes with no trouble, but if there are multiple paragraphs, requires you to use modified syntax to open the quote, and then manually indicate the following paragraph(normally not required) to make it work right. I’ll let you guess how many people actually bother. I think this is an extension, and the original Textile spec can’t handle the situation at all.

  35. May 18, 2007 by Su

    Well, I wasn’t quite able to trick the filter. Imagine those doubled blank lines before and after the quote contain the blockquote tags, each with empty lines above and below.

  36. May 18, 2007 by Roger Johansson (Author comment)

    Mike:

    Sorry, backticks didn’t seem to form code for me.

    They don’t work for me either. I think it’s a problem with the Markdown plugin for Movable Type.

    Nick:

    If the text you are placing in a blockquote is not, semantically speaking, a paragraph, and none of the other block-level elements are appropriate, then place it in a div which is also a block-level element but is semantically neutral.

    Good observation. I probably should have mentioned examples of other block level elements in the article.

    Michel:

    Blockquotes seem to pass through your HTML filter easilly, but not automatic links (like ) and any HTML snippet you may put in a Markdown code span or code block.

    I think I was unclear: I do not use an HTML filter here, only Markdown. Automatic linking is off due to spam.

    Su: I tried fixing the code in your comment, but I can’t get it to work either.

  37. @Aldrik, you’ve just used a blockquote to quote a single sentence from my comment (i.e. less than a paragraph), thus breaking your own rule about less-than-a-paragraph should be quoted inline, and making my point for me.

    If we can’t see the semantic value in including an extraneous p element, why support it?

  38. May 18, 2007 by Aldrik

    Chris

    you’ve just used a blockquote to quote a single sentence from my comment (i.e. less than a paragraph), thus breaking your own rule about less-than-a-paragraph should be quoted inline, and making my point for me.

    If we can’t see the semantic value in including an extraneous p element, why support it?

    Yeah I know it’s just that the markdown syntax page doesn’t show to way of adding a q tag (so a blockquote was the next best thing).

    I don’t know how you can not see the semantic value of putting a paragraph(s) in a p element(s) at all times. You may think it may bloat the code a little bit but you may be quoting a list, a chunk of code, etc, so you cant assume it’s going to be a paragraph.

  39. Errr… Roger?

    I think I made your pages invalid more than once, for which I humbly apologize! :-)

    I didn’t know that if I want to use blockquote, and, for example, include just one sentence into it, I must use a paragraph tag inside…

    …but!

    It just occured to me!

    Why should we add “P” tags in the BLOCKQUOTE element when posting comments, when usually the P tag is added automatically when posting commments outside any BLOCKQUOTE elements?!

    OK, in this case, the “P” should be added automatically also in the case when we make a blockquote element, isn’t it?

    Paragraphs and line breaks are automatically created in any contemporary blog sofware. So, if I want to create a paragraph, I simply press ENTER two times, like this:

    ‘P’ And lo and behold! here it is, a new shiny PARAGRAPH! :-) ‘/P’

    Now, if I make a blockquote, should be the same, right?

    I mean, because P & BR are automatically created, the P inside a BLOCKQUOTE could be automatically created as well, in case it is really needed by the 4.01 Strict?

    Reason why I do not enclose all of my paragraphs in P and /P: because MovableType/WordPress/etc. automatically creates them for me!

    Reason why I shouldn’t create manually P /P in the BLOCKQUOTE: the same! (or at least, this is my reasoning…)

    What do you say? :-)

    Cheers, M.

  40. May 18, 2007 by Aldrik

    Typo/Should-be: doesn’t show a way of adding a q tag. PS. I also couldn’t add a cite tag for your name (doesn’t mean I didn’t want to).

  41. //Sidenote: Would be really nice if you would allow at least the use of the following HTML elements: quote, blockquote, em, strong, a href…

    Reasons:

    1) They are safe to use, and no harm can be done through them. 2) They are very popular in the blogging world. 3) Most of the times, when seeing a blog, people /expect/ these to work, even if HTML is not allowed.

    For example, WordPress, which became blogging platform #1 lately, by default allows these 5-6 basic formatting elements to be used.

    Most of the blogs I read daily, allow you to use them.

    And when somewhere we meet a syntax which is different (like, in the case of your blog), it puzzles a little.

    Yes, you have some advantages to do so, I am sure of it.

    But there are also some disadvantages to that - namely, the almost-standard (in the blogging word) safe use of A HREF, EM, STRONG, etc. …and this is what people expect:)

    My $ 0.02, M.

  42. Yep. I guess it’s easier on some days than others to tell that English isn’t my native language.

    Trust me. Getting that wrong means you are on the same level as most native speakers. :)

  43. I think I was unclear: I do not use an HTML filter here, only Markdown.

    Ok. But then how did you deactivate the usage of blockquote tags then, if not by activating a filter for HTML tags?

    Even if the filter is built-in to Movable Type, it doesn’t mean it can’t interfere with Markdown. And since I’m not able to create a Markdown code block or code span with HTML tags in it, my guess is that some filter removes the tags before they even reach Markdown.

    For instance, here is some sample HTML code in which I put the word “emphasis” surrounded by two em tags (one opening, one closing), written according to Markdown’s rules for code blocks (four-space indent):

    <em>emphasis</em>
    

    Let’s bet the two em tags will have dissapered once posted. (In fact, no need to post, previewing is enough. The code block is there, but the tags disappear.) So basically, you can’t post HTML code samples now, even using the Markdown syntax.

  44. The blockquote is a presentational model. It’s being abused and producing invalid markup without a clear semantic meaning. I stick to using <q> tags inside paragraphs instead, it makes more sence to me.

  45. May 19, 2007 by David

    My guess as to why text inside of a blockquote should be marked up within p tags is that you may have more than one paragraph or section of text inside 1 blockquote. Without the p tag you’d have to use br’s to separate the text into 2 or more sections. And we all know that is not the correct way.

  46. May 19, 2007 by Roger Johansson (Author comment)

    Michel:

    I think I made your pages invalid more than once, for which I humbly apologize! :-)

    Don’t worry about it. Just don’t do it again ;-).

    the “P” should be added automatically also in the case when we make a blockquote element, isn’t it

    Yes, and it is added automatically if you use Markdown syntax.

    Would be really nice if you would allow at least the use of the following HTML elements

    Yes, I agree. But it doesn’t seem to be possible to get Movable Type to do so while using Markdown unless I allow people to post HTML, which I don’t want. I don’t know enough Perl to hack the comment engine or I would have done so long ago, if only to fix the stupid HTML entity removal that causes problems for way too many commenters.

    Kim:

    Getting that wrong means you are on the same level as most native speakers. :)

    :-D

    Michel Fortin:

    But then how did you deactivate the usage of blockquote tags then, if not by activating a filter for HTML tags?

    All I did was uncheck “Allow HTML in comments”. I guess that interferes with Markdown somehow.

    So basically, you can’t post HTML code samples now, even using the Markdown syntax.

    Right, I noticed that. That’s probably why I had Movable Type configured to allow HTML. *sigh*

  47. I think we should allow both inline and block level content for blockquote and form but not both in HTML5.

  48. May 19, 2007 by Roger Johansson (Author comment)

    Regarding Markdown and HTML in comments: I discovered that there is indeed a filter active: Movable Type’s sanitize function. It only allows some HTML elements and attributes and strips everything else. I updated the list of allowed elements and attributes to match what Markdown outputs.

    Anne: What is the reason for not allowing both inline and block level content? Does it make parsing more complicated?

  49. I find it quite ugly that there are elements whose content model allow you to freely mix inline- and block-level content

    <div>inline content, juxtaposed <p>with block-level content.</p></div>

    I suppose it’s far too late to attempt to make such things non-conformant. But many things would be much simpler if there were a clean separation between block and inline modes.

    P.S.: your textarea fails to escape markup, so “&lt;” becomes “<”, after previewing. For obvious reasons, this is bad.

  50. Does this mean the same goes for DD elements? W3 says the following about it:

    ‘Definition lists vary only slightly from other types of lists in that list items consist of two parts: a term and a description. The term is given by the dt element and is restricted to inline content. The description is given with a dd element that contains block-level content.’

    Which means the dd should only contain block level content. Does this mean this mean plain text should always be places between [P]’s inside a [DD] or [BLOCKQUOTE]?

  51. I don’t use block-quote elements.

  52. I can’t see any rational semantic reason why a blockquote needs to contain a p, unless it’s a quote of more than one paragraph.

    Me neither.

    The way I understand it through cumulative knowledge (and I can’t point to a single source) is that certain block level items like body, div and blockquote are considered flow-level block elements (defining, strictly speaking, sections and not the blocks themselves) and as such are not supposed to contain inline elements or text nodes as direct descendants.

  53. “block level items like body, div and blockquote are considered flow-level block elements”

    I think you got it wrong there, body and div are flow-level, but not blockquote and form. It’s all in the doctype.

    Thus, putting a span directly under body is allowed, but not under a blockquote. Don’t ask me why.

  54. Umh, no.

    I think you got it wrong there, body and div are flow-level, but not blockquote and form. It’s all in the doctype.

    Umh… no.

    The content model of <body> is (%block;|SCRIPT), just like <blockquote>. The content model of <div> (and, to answer someone else’s question, <dd>) is %flow;, which is to say that it can contain both block-level and inline content.

    It’s really not that hard to read the DTD. So there’s no reason not to get this right.

  55. sorry Jacques, my mistake. Replace body with div in my comment and you’ll get my point. Sorry for sounding harch, my appologies.

  56. More of W3C’s crack-induced stupidity, I say.

  57. Roger, neither parsing or styling is complicated by it. It just makes it harder to determine what it is. If it contains both it’s hard to determine whether you are dealing with a section or a paragraph. Are there good use cases for allowing both block and inline at the same time?

  58. May 27, 2007 by Roger Johansson (Author comment)

    Anne:

    Are there good use cases for allowing both block and inline at the same time?

    Not that I can think of (other than allowing authors who don’t want to learn HTML to create conformant markup, which I strongly oppose). I was asking out of genuine curiosity because I thought there might be some reason known only to browser developers.

  59. Just another reminder. If your site is made with PHP server-side, one can use the tidy-extension. This is how I use it:

    If a user submits content with small fixable errors (“warning” in tidy), I’ll let tidy fix it and store it in my DB. If Tidy reports an “error”, I’ll refuse the content.

    This has side benefits too. XSS-attacks become much harder if you require valid markup. Just remove all script-tags, pseudo-protocols and event-handlers after the markup has been proven to be valid. In combination with checks for attacks through sneaky encodings (UTF-7, etc) this is as near 100 % safe as you can get.

  60. As some other posters have said, don’t forget the CITE attribute on your blockquotes. I know none of the browsers do anything with it but, with a little javascript, you can make it usable. For example, I made a little script to show a hyperlinked tooltip that uses the cite att. http://willcode4beer.com/tips.jsp?set=blockquoteHover

    There are a lot of other possiblilities for it too.

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.