Extending An Existing Theory WAY Too Far


([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|([\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

Look at this crap. Can you read it? I sure as hell can’t, and I consider myself fairly proficient with regex, to the point that I use the mail program mutt with the indexer notmuch since I can do regex search on all my mail, which is a power that can only really be appreciated when you really need to find that one address someone sent you 5 years ago. Then you can just type:

[1-9][0-9]{2-4} \w \w

t gave me about 20 results, and picking the address out from among them was trivial.

Regex is a very handy tool. I wish I had known about it in banking, since it would have made searching 10-K’s a hell of a lot easier.

But it does have limits. For something like email validation, it’s easier to just write a simple parser (think of a parser like regex’s big brother, since it essentially stores the context of the symbols it parses). For something like HTML tags, it’s literally impossible to do with regex. It’s too weak. For programming languages, you actually, provably, need a parser.

In any case, why would you ever do something like the above rather than just learn a new and not very hard skill? This is a perfect example of someone pushing a tool way too far rather than just making a new tool, and it takes way more effort. If it wasn’t such a bad idea, I’d be kind of impressed.

Related Posts

Handy command line benchmarking tool

Stan Rogers

Ultimate Hot Couch Guy

Quote on Java Generics

The Programmer Tendency

Figure out undocumented JSON with gron

Mental Model of Dental Hygiene

Book Review: Swastika Night

Is there a name for this construction?

Fun with negation and idioms