Regex tutorial: matching an email
A great topic! Matching an email address using regular expressions (regex) can be a bit tricky, but don't worry, I'm here to guide you through it.
The Basics
Before we dive into the regex, let's cover some basic requirements for an email address:
- It must start with a local part (the part before the
@
symbol). - The local part can contain letters, numbers, dots (.), hyphens (-), and underscores (_).
- The local part must be followed by the
@
symbol. - The domain part (the part after the
@
symbol) must contain at least one dot (.) and cannot be longer than 253 characters. - The domain part can contain letters, numbers, dots (.), hyphens (-), and underscores (_).
The Regex Pattern
Here's a regex pattern that matches most common email addresses:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Let's break it down:
^
matches the start of the string.[a-zA-Z0-9._%+-]+
matches one or more characters that are letters (both uppercase and lowercase), numbers, dots (.), hyphens (-), underscores (_), percent signs (%), or plus signs (+). This matches the local part.@
matches the@
symbol.[a-zA-Z0-9.-]+
matches one or more characters that are letters (both uppercase and lowercase), numbers, dots (.), or hyphens (-). This matches the domain part.\.
matches a dot (.) character.[a-zA-Z]{2,}
matches the top-level domain (TLD) which must be at least 2 characters long and contain only letters (both uppercase and lowercase).$
matches the end of the string.
Example Matches
Here are some examples of email addresses that match this regex pattern:
Example Non-Matches
Here are some examples of email addresses that do not match this regex pattern:
@example.com
(missing local part)john.doe@
(missing domain part)john.doe@example
(missing TLD)[email protected]
(extra dot in the domain part)
Tips and Variations
- To match internationalized domain names (IDNs), you can add the
\x{00C0-\x{00FF}
Unicode character class to the domain part. - To match email addresses with non-ASCII characters, you can use the
\x{0000}-\x{10FFFF}
Unicode character class. - To match email addresses with comments (e.g.,
[email protected] (comment)
), you can add a separate regex pattern to match the comment.
I hope this tutorial has helped you understand how to match email addresses using regular expressions!