Arjan's World: Regular Expressions *Can* Be Your Friend If You Treat Them Well
Monday, October 10, 2005

Regular Expressions *Can* Be Your Friend If You Treat Them Well

Regular expressions always look like Perl to me... Incomprehensible.

Today I was looking for a way to handle some URLs in text to be displayed on a webpage. This specific webpage is fed some old input containing web links which could not be changed. The not-too-difficult task ahead was to change these old URLs which are set up according to a predictable scheme in such a way that they automatically appear allright on the new page according to the new scheme (the old text could just not be changed with a Find-And-Replace action because it still must be available for the old application). It was some time since I last used regexps and I can say I learned what is a greedy regular expression by working with one :)

The expression "/pathtourl/.*?/" did the trick. A first attempt did not include the ?, leading to a greedy expression. It keeps on searching for the last / character it can find. In my case that's normally the one in the anchor closing tag </a>. That way the complete URL plus the part between the tags up and until '</' is replaced, leading to some very invalid HTML.

So, as I mentioned the '?' did the trick..... However: I found not all URLs in the text always adhered to this principle. Some did not have a second '/' in the URL, leading to the same situation as described above :)

Sometimes I feel like a bad bugfixer when creating Regexps: fix the expression and see another bug popping up...


