Internationalized domain names email validation regex
A very specific and interesting topic!
Internationalized Domain Names (IDNs) allow domain names to contain non-ASCII characters, such as accented letters, non-Latin scripts, and non-English characters. To validate an email address that may contain IDNs, you'll need a regular expression that can match a wide range of characters.
Here's a regex pattern that should help you validate an email address with IDNs:
^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.?([a-zA-Z]{2,}|xn--[a-zA-Z0-9]{1,59})$
Let's break down this regex pattern:
Local part (^[a-zA-Z0-9.!#$%&'*+/=?^_
{|}~-]+`):
^
matches the start of the string[a-zA-Z0-9.!#$%&'*+/=?^_
{|}~-]+` matches one or more of the following characters:- Letters (both ASCII and non-ASCII, thanks to the Unicode property
\p{L}
) - Digits
- Special characters (.,!, #, $, %, &, *, +, /, =,?, ^, _, `, {, |, ~)
- Letters (both ASCII and non-ASCII, thanks to the Unicode property
+
quantifier matches one or more of the preceding pattern
Domain (@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.?
):
@
matches the@
symbol[a-zA-Z0-9]
matches a single letter or digit (the first character of the domain)(?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
matches an optional sequence of:- 0 to 61 occurrences of letters, digits, or hyphens
- followed by a letter or digit
(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*
matches an optional sequence of:- a dot (
\.
) - followed by a domain (matching the same pattern as above)
- repeated 0 or more times
- a dot (
\.?
matches an optional dot (\.
)
Top-level domain (([a-zA-Z]{2,}|xn--[a-zA-Z0-9]{1,59})$
):
([a-zA-Z]{2,}|xn--[a-zA-Z0-9]{1,59})
matches the top-level domain (TLD)([a-zA-Z]{2,})
matches 2 or more ASCII letters (e.g., "com", "org")|
is an OR operatorxn--
is a prefix for IDN TLDs (e.g., "xn--wgbh1c3ax7g")[a-zA-Z0-9]{1,59}
matches 1 to 59 occurrences of letters, digits, or hyphens
$
matches the end of the string
This regex pattern should match most valid email addresses, including those with IDNs. However, keep in mind that email address validation is a complex topic, and this pattern may not cover all possible valid email addresses.
Remember to adjust the regex pattern according to your specific requirements and the programming language you're using.