Cleaner HTML from the WordPress wp_list_pages function

In WordPress there are several ways of creating navigation menus. One way is to use the wp_list_pages function to output a list of pages. It works, but unfortunately the resulting HTML is less than ideal.

For many reasons I like my HTML tidy, without redundant class, id, and title attributes. I’ve found two different approaches to cleaning up the HTML created by the wp_list_pages function, and I’ll explain both here.

Method one: using filters and regular expressions

The approach that may be easiest to understand, and the first approach I tried, is to let wp_list_pages do its job and then clean up the markup by using a bunch of regular expressions. I’m not sure how efficient this is from a performance point of view, but it does the job.

For this approach you need to add the following to your functions.php file:

function clean_wp_list_pages($menu) {
    // Remove redundant title attributes
    $menu = remove_title_attributes($menu);
    // Remove protocol and domain name from href values
    $menu = make_href_root_relative($menu);
    // Give the list items containing the current item or one of its ancestors a class name
    $menu = preg_replace('/class="(.*?)current_page(.*?)"/','class="sel"',$menu);
    // Remove all other class names
    $menu = preg_replace('/ class=(["\'])(?!sel).*?\1/','',$menu);
    // Give the current link and the links to its ancestors a class name and wrap their content in a strong element
    $menu = preg_replace('/class="sel"><a(.*?)>(.*?)<\/a>/','class="sel"><a$1 class="sel"><strong>$2</strong></a>',$menu);
    return $menu;
}
add_filter( 'wp_list_pages', 'clean_wp_list_pages' );

Please note that this filter function uses the remove_title_attributes and make_href_root_relative functions which I have previously described in How to make WordPress URLs root relative and Removing title attributes from WordPress links. The filter will be applied each time you call wp_list_pages.

While using regular expressions this way works when you know what the output to filter looks like, it feels a bit kludgy, so I thought that there must be a “cleaner” way. And there is.

Method two: Using a custom walker function

The cleaner approach is to use a custom “walker” function to generate the HTML you want. How this works took a bit longer to wrap my head around, but this method lets me get rid of most regular expressions and alter the markup directly.

To use this method, add the following to your functions.php file:

class Clean_Walker extends Walker_Page {
    function start_lvl(&$output, $depth) {
        $indent = str_repeat("\t", $depth);
        $output .= "\n$indent<ul>\n";
    }
    function start_el(&$output, $page, $depth, $args, $current_page) {
        if ( $depth )
            $indent = str_repeat("\t", $depth);
        else
            $indent = '';
        extract($args, EXTR_SKIP);
        $class_attr = '';
        if ( !empty($current_page) ) {
            $_current_page = get_page( $current_page );
            if ( (isset($_current_page->ancestors) && in_array($page->ID, (array) $_current_page->ancestors)) || ( $page->ID == $current_page ) || ( $_current_page && $page->ID == $_current_page->post_parent ) ) {
                $class_attr = 'sel';
            }
        } elseif ( (is_single() || is_archive()) && ($page->ID == get_option('page_for_posts')) ) {
            $class_attr = 'sel';
        }
        if ( $class_attr != '' ) {
            $class_attr = ' class="' . $class_attr . '"';
            $link_before .= '<strong>';
            $link_after = '</strong>' . $link_after;
        }
        $output .= $indent . '<li' . $class_attr . '><a href="' . make_href_root_relative(get_page_link($page->ID)) . '"' . $class_attr . '>' . $link_before . apply_filters( 'the_title', $page->post_title, $page->ID ) . $link_after . '</a>';

        if ( !empty($show_date) ) {
            if ( 'modified' == $show_date )
                $time = $page->post_modified;
            else
                $time = $page->post_date;
            $output .= " " . mysql2date($date_format, $time);
        }
    }
}

This is basically the default WordPress code for the start_level and start_el functions of the Walker_Page class changed to output clean markup. The Walker_Page class can be found in the file wp-includes/classes.php (look for it in wp-includes/post-template.php as of WordPress 3.1).

I also made it wrap the link text of the selected item in a strong element to avoid relying on CSS alone to convey which item is selected.

To use this you need to call wp_list_pages with a walker parameter, like this:

<?php
$walker = new Clean_Walker();
wp_list_pages( array(
    'title_li' => '',
    'walker' => $walker,
    ) );
?>

Example markup before and after

Both of the approaches described here create the same markup. So how does that markup differ from the default? Here’s a simple example of a few pages listed by calling the unfiltered wp_list_pages function like this:

<ul id="nav">
<?php wp_list_pages('title_li=&'); ?>:
</ul>

The HTML WordPress outputs will look something like this:

<ul id="nav">
    <li class="page_item page-item-2 current_page_ancestor current_page_parent"><a href="http://example.com/page-1/" title="Page 1">Page 1</a>
        <ul class='children'>
            <li class="page_item page-item-4987 current_page_item"><a href="http://example.com/page-1/sublevel-1-1/" title="Sublevel 1-1">Sublevel 1-1</a></li>
            <li class="page_item page-item-4989"><a href="http://example.com/page-1/sublevel-1-2/" title="Sublevel 1-2">Sublevel 1-2</a></li>
        </ul>
    </li>
    <li class="page_item page-item-1630"><a href="http://example.com/page-2/" title="Page 2">Page 2</a></li>
    <li class="page_item page-item-1633"><a href="http://example.com/page-3/" title="Page 3">Page 3</a></li>
</ul>

This markup is not what I want. It contains absolute URLs, loads of class names I have no use for, redundant title attributes, and inconsistent use of ' and " for quoting attribute values.

After using either of the approaches described here, the markup will look like this instead:

<ul id="nav">
    <li class="sel"><a href="/page-1/" class="sel"><strong>Page 1</strong></a>
        <ul>
            <li class="sel"><a href="/page-1/sublevel-1-1/" class="sel"><strong>Sublevel 1-1</strong></a></li>
            <li><a href="/page-1/sublevel-1-2/">Sublevel 1-2</a></li>
        </ul>
    </li>
    <li><a href="/page-2/">Page 2</a></li>
    <li><a href="/page-3/">Page 3</a></li>
</ul>

Much cleaner. And if you’re not happy with that either, it’s just a matter of tweaking either of the techniques until the markup is what you want.

If you’re like me and want your markup, I hope this can be of some help.

Posted on January 13, 2011 in WordPress

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.