Tuesday, October 12, 2010

Regular Expressions

Since I have to parse XML files these days, I thought of this as a good opportunity to make myself familiar with regular expressions.

Regular expression syntax looks very daunting at the beginning. But if your persevere, you will get over your fears in half an hour. Here is an example string that I needed to parse:

<table><tr class="x1">2</tr><tr class="x2">-454</tr></table>

I wanted to extract the numbers 2 & -454. The pattern I used initially was <tr class=".*">(.*)</tr>. However, this returned just a single group which was 2</tr><tr class="x2">454. When I added a question mark between "*" and ")" as <tr class=".*">(.*?)</tr> I got my two groups with 2 and -454 in them.

Explanation: When you don't put the "?", it searches till the last </tr>. When you add "?", you make the string optional, so intermediate </tr>'s are taken into account.

There is a nice webapp called RegExr that you can use to quickly test your expressions (thanks to webapps.stackexchange)

Bonus tip: If you want to display HTML code inside your blog post,
1. you have click on "Post Options" (just below the post edit window) and under "Compose Settings" select "Show HTML literally"
2. use "&lt;" and "&gt;" in place of "<" and ">"

No comments: