Get element text, including alt text for images, with JavaScript

Sometimes I find myself wanting to get the text contents of an element and its descendants. There is a DOM method called textContent that can be used for this. There is also jQuery’s text() method. Unfortunately neither method returns what I want.

In both cases, elements that can have alt attributes are omitted from the returned string. In my opinion, alt text is the text content of an img, input[type=image] or area element and should be returned by methods like these. I also find it a bit weird that they return the contents of script elements.

Not having any luck finding a method that includes alternative text and omits script elements when getting text content, I wrote my own:

var getElementText = function(el) {
    var text = '';
    // Text node (3) or CDATA node (4) - return its text
    if ( (el.nodeType === 3) || (el.nodeType === 4) ) {
        text = el.nodeValue;
    // If node is an element (1) and an img, input[type=image], or area element, return its alt text
    } else if ( (el.nodeType === 1) && (
            (el.tagName.toLowerCase() == 'img') ||
            (el.tagName.toLowerCase() == 'area') ||
            ((el.tagName.toLowerCase() == 'input') && el.getAttribute('type') && (el.getAttribute('type').toLowerCase() == 'image'))
            ) ) {
        text = el.getAttribute('alt') || '';
    // Traverse children unless this is a script or style element
    } else if ( (el.nodeType === 1) && !el.tagName.match(/^(script|style)$/i) ) {
        var children = el.childNodes;
        for (var i = 0, l = children.length; i < l; i++) {
            text += getElementText(children[i]);
        }
    }
    return text;
};

It expects the argument to be a reference to an element.

To get the text contents of an entire document, you’d call it like this:

var bodyText = getElementText(document.body);

There could be better ways of doing this, of course, but none that I have been able to find.

Posted on May 16, 2011 in JavaScript