Sunday, February 20, 2011

XPaths with xmllint

xmllint is a command-line XML tool used to validate and pretty-print XML documents. More importantly, it offers an interactive shell mode which allows you to use xpaths to print out elements. For example, //body will print out the body element of an HTML document.

I wrote a useful bash function, which uses xmllint to evaluate xpaths really easily:

xpath()
{
    if [ $# -ne 2]; then
        echo "Usage: xpath xpath file"
        return 1
    fi
    xmllint --shell $2 <<< "cat $1" | sed '/^\/ >/d'
}
Example:
sharfah@starship:~> xpath "//body" index.html
<body>Hello World!</body>