Validating email addresses

There’s a thread on programming reddit reminding developers that + is a valid character in an email address. So you should make sure that your validations allow for this.

The regular expression that I’ve been using to validate emails is

/^([^@\s]+)@((?:[-a-z0-9]+.)+[a-z]{2,})$/i

This RE will validate most email addresses correctly. Fully validating an email address is quite complex (it’s not even fully clear which characters are allowed) and for the purposes of input validation this is good enough.

?> email = /^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i
?> email =~ '@fred.com' => nil
>> email =~ 'fred.com' => nil
>> email =~ '@fred.com' => nil
>> email =~ 'f@fred.com' => 0
>> email =~ 'fred@fred.com' => 0
>> email =~ 'fred+test@fred.com' => 0
>> email =~ 'fred.jones+test@fred.com' => 0
>> email =~ 'y@x.org' => 0
>> email =~ 'bob@.........com' => nil
>> email =~ 'bob@-.com' => 0

If you spot any more cases where this RE fails please post them in the comments. It allows ‘bob@-.com’. It fails if there are spaces in the email which are apparently allowed in the RFC.

Edit: Reading through the comments on the reddit thread makes me wonder if there is any point to email validation. Some client-side validation may be useful to warn the user if they enter an obvious non-email address. If you send a confirmation email then that’s your validation. If you don’t then it’s easy for someone to make up an email with a valid structure. So is email validation worth the risk of having someone not be able to sign up for your site because they have some unusual characters in their email address?

Good point re the point of validating, if you catch my drift! But an optimal e-mail address RE is still useful for identifying e-mail addresses within text.

Maybe you should be limiting the length of the TLD to 4 chars so this wouldn't match:

"Hi Aidan, my e-mail address is eoghan@contrast.iecould you forward that invoice?".

Better still:

/^([^@\s]+)@((?:[-a-z0-9]+.)(+[a-z]{2,3}|.mobi))$/i

Is that syntax correct?

The syntax is invalid. But I don’t think there’s much point in trying to limit the length of the TLD. There are lots of TLDs other than .mobi that are 4 or more characters. e.g. .name, .museum, .travel.

Also the original RE I gave is for matching against an entire string, e.g. before insertion in a database. If you were using it to extract emails from a string you should drop the ^ from the start and the $ from the end. And change the grouping brackets to capture the entire email address.

It is a bit pointless beyond "does it have an @ and one . after the @" level. It is a lot of hassle to hastly update production servers when a client tells you your system is not accepting their email address. As programmers we tend to want to validate any email input, the delusion of "correctness."

Part of our validation involves sending an email to the address. That is a reasonable test IMO though you want to be sure it can handle temporary problems. Also not all systems should insist on validating via sending out an email.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
4 + 7 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.