Regex and Anchor tags

I had been looking on the Internet for a solution to a program I had be working on and sadly didn’t come up with one. I was trying to find a way to use regular expressions to find all the html anchor tags in a string along with matching a wild card URL (ie: secnem.com.*test.html). And after many hours of thrusting my head into my keyboard I came up with:

/<a [^><]*href=[\”\’][^\”\’><]*<rule>[^\”\’><]*[\”\’][^>]*>\s*.*\s*<\/a>/iU

You’d replace <rule> with what ever url rule you want, except for any wild cards in the url I needed to use [^\”\’><]* instead of just .* . This would prevent it from matching outside of the anchor. Bascially [^\”\’><]*  means: match any character except a double quote, single quote, greater than sign, or less than sign. All of which should not be in the href field to begin with.

If you wanted to see what the content of the anchor tag was or the matched href, simply put some brackets around like so:

/<a [^><]*href=[\”\’]([^\”\’><]*<rule>[^\”\’><]*)[\”\’][^>]*>(\s*.*\s*)<\/a>/iU

Hope this helps someone. You can of course adapt this to other html tags by replacing ‘a’ for ‘table’ or w/e. Same with the href. larsolavtorvik.com has a great resource for testing regex in real time and addedbytes.com has a great cheat sheet as well.

Discussions — No responses yet