by Leon Rosenshein

Now You Have Two Problems

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski

Of course that's not strictly true. Even if you're not writing Perl you probably use a regex or two every day. After all, grep and ag (the silver searcher) take your input as a regex and use it to scan your file system.

And used correctly a regex is great for detecting and potentially extracting/transforming bits of a string. Need to know if a pattern appears in your strings? Regex is great for that. Need to know if a string is a phone number? There's a good regex for those things.

The thing to remember is that while a regex can scan text and has limited forward/backward looking ability, it's not a parser, and doesn't understand context. Arbitrary nesting of open/close pairs (I'm talking about you HTML) will drive the parser (and the developer) nuts, so just don't do it. If you're actually parsing HTML then use a DOM.

Even when a regex is the right answer, be careful. When your regex starts to become write-only code you've probably gone too far. Don't be afraid to split your regex into parts, capturing a bigger block of text and then using more specific regex's on it. It might not be quite a performant, but it will be more understandable and maintainable, and that's what we're really going for. When the next unsuspecting programmer (you?) looks at it in 6 months will they be able to understand what's happening?

And finally, a regex is a great place for an edge condition to hide. So when possible don't write your own. There are well known regex for parsing common things, email addresses, phone numbers, SSNs, credit card numbers, etc. If the thing you're looking for has a Backus–Naur form, there's probably a regex for it already, so use it. If not, the BNF form will help you generate the regex.