URL Regex Python Validator
Search...
⌘K
URL Regex Python Validator
Search...
⌘K


URL Regex Python Validator
Use the URL Regex Python Validator to accurately test patterns for validating website links in Python. Whether you’re checking for http, https, or complex paths, this tool helps ensure your URLs are clean, correct, and reliable. For more regex testing, explore the Python Regex Tester, Email Regex Python Validator, or IP Address Regex Python Validator.
[A-Z]
: uppercase letters[a-z]
: lowercase letters[0-9]
: digits\.
: a literal dot+
: one or more of the preceding*
: zero or more of the preceding?
: optional (zero or one)^
: start of string$
: end of string
Test your APIs today!
Write in plain English — Qodex turns it into secure, ready-to-run tests.
Regular Expression - Documentation
What is the URL Regex Python Validator?
The URL Regex Python Validator is designed to help you check whether your regular expressions correctly match valid web addresses. This includes checking for:
Protocols like http or https
Domain names and subdomains
Optional ports, paths, query parameters, and fragments
It uses Python’s re module and is ideal for form validation, web crawling, data parsing, and link-checking tasks.
Common URL Regex Patterns
Basic HTTP/HTTPS URL
^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Matches: http://example.com, https://qodex.ai
Does not match: example.com, ftp://server.com
Full URL with Optional Paths & Queries
^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[a-zA-Z0-9\-._~:/?#[\]@!$&'()*+,;=]*)?$
Matches: https://site.com/path?search=value, http://domain.org
With Optional Port
^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(:\d{2,5})?(\/.*)?$
Matches: http://localhost:8000/home, https://api.site.com:443/v1
What Does the Regex Actually Check?
HTTP Basic Authentication: Optionally allows for before the domain.
Domain Structure: Enforces at least one subdomain (e.g., ), accepts dashes within subdomains, and ensures each subdomain and the top-level domain don't exceed length limits (max 63 characters each, full domain under 253).
Top-Level Domain: Only allows alphanumeric characters (no dashes).
Localhost Support: Accepts as a valid domain.
Port Numbers: Optionally matches ports up to 5 digits (e.g., ).
IPv4 Addresses: Recognizes standard IPv4 addresses in the netloc.
IPv6 Addresses: For IPv6 validation, you’ll want to supplement with a dedicated IPv6 validator, as the regex alone may not cover all edge cases.
Handling Complex and Edge Cases
While the above regex patterns cover most use-cases, URLs in the wild can be tricky—especially with top-level domains like .co.uk
or unconventional subdomain structures. If you need a more robust solution that accounts for these "weird cases," consider a regex pattern that also allows for:
Optional protocol (e.g.,
http://
,https://
, or none)Optional subdomains (like
www.
)Support for multi-part TLDs (e.g.,
co.uk
)Paths, query strings, and fragments
Hyphens in domain names
Example Enhanced Regex (Python-style)
regex = re.compile( r"(\w+://)?" # protocol (optional) r"(\w+\.)?" # subdomain (optional) r"(([\w-]+)\.(\w+))" # domain r"(\.\w+)*" # additional TLD parts (optional) r"([\w\-\.\_\~/]*)" # path, query, fragments (optional) )
Test Cases for Thoroughness
This more flexible approach will match a variety of real-world URLs, such as:
http://www.google.com
https://google.co.uk
google.com/~user/profile
www.example.org
https://sub.domain.co.uk/path/to/page
example.com
.google.com
(edge case, may require post-processing)
By testing against a broad set of examples—including those with extra dots, missing subdomains, or unusual TLDs—you can ensure your regex is both comprehensive and resilient.
Feel free to adjust the patterns and test cases as needed to suit the specific requirements of your application. Regex is powerful, but always test thoroughly to avoid surprises!
Advanced Regex for Edge Cases
A more thorough regex can handle authentication, IPv4/IPv6 addresses, localhost, port numbers, and more:
import re DOMAIN_FORMAT = re.compile( r"(?:^(\w{1,255}):(.{1,255})@^)" # http basic authentication [optional] r"(?:(?:(?=\S{0,253}(?:$:)))" r"((?:?\.)+" # subdomains r"(?:[a-z0-9]{1,63})))" # top level domain r"localhost" r")(:\d{1,5})?", # port [optional], re.IGNORECASE ) SCHEME_FORMAT = re.compile( r"^(httphxxpftpfxp)s?$", # scheme: http(s) or ftp(s) re.IGNORECASE ) from urllib.parse import urlparse def validate_url(url: str): url = url.strip() if not url: raise Exception("No URL specified") if len(url) > 2048: raise Exception(f"URL exceeds its maximum length of 2048 characters (given length={len(url)})") result = urlparse(url) scheme = result.scheme domain = result.netloc if not scheme: raise Exception("No URL scheme specified") if not re.fullmatch(SCHEME_FORMAT, scheme): raise Exception(f"URL scheme must either be http(s) or ftp(s) (given scheme={scheme})") if not domain: raise Exception("No URL domain specified") if not re.fullmatch(DOMAIN_FORMAT, domain): raise Exception(f"URL domain malformed (domain={domain})") return url
This approach splits the URL and validates the scheme and domain separately, handling a wider array of valid URLs (including those with authentication and ports). For even greater accuracy (such as validating IPv6), you might want to add an IPv6 validator.
Alternative: Using Validation Libraries
While regex is great for quick URL checks, Python has some powerful validation libraries that can save you time and headaches, especially when edge cases start popping up.
Using the Package
The package provides simple functions for validating URLs (and many other types of data like emails and IP addresses). Here’s how you can use it:
import validators print(validators.url("http://localhost:8000")) # True print(validators.url("ftp://invalid.com")) # ValidationFailure object (evaluates to False)
For more robust code, consider wrapping this check to always return a boolean:
import validators from validators import ValidationFailure def is_string_an_url(url_string: str) -> bool: # Always strip whitespace before validating! Result = validators.url(url_string.strip()) return result is True # Only True is valid; ValidationFailure is falsy
Examples
print(is_string_an_url("http://localhost:8000")) # True print(is_string_an_url("http://.www.foo.bar/")) # False print(is_string_an_url("http://localhost:8000 ")) # True (after .strip())
Tip: Always trim leading and trailing spaces before validating URLs, as even a single space will cause most validators—including regex and libraries like these—to reject the input.
Even More Powerful: Pydantic and Django
If you’re using frameworks like Pydantic or Django, you get validation utilities that handle a lot of this for you:
Validation Using Django’s URLValidator
If you’re already using Django, leverage its built-in URL validator for comprehensive checks:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_string_an_url(url_string: str) -> bool: validate_url = URLValidator() try: validate_url(url_string.strip()) return True except ValidationError: return False
Examples
print(is_string_an_url("https://example.com")) # True print(is_string_an_url("not a url")) # False
Adding Django just for its URL validation is probably overkill, but if you’re in a Django project already, this is one of the most reliable approaches.
Validation Using Pydantic
Pydantic offers types like AnyHttpUrl
for strict URL validation:
from pydantic import BaseModel, AnyHttpUrl, ValidationError class MyConfModel(BaseModel): URI: AnyHttpUrl try: myAddress = MyConfModel(URI="http://myurl.com/") print(myAddress.URI) except ValidationError: print('Invalid destination')
This approach raises exceptions for invalid URLs and supports a variety of URL types.
With these approaches—regex for quick checks, and libraries for thorough validation—you can confidently handle URL validation in a variety of Python projects. Whether you want something lightweight for a script or full-featured for an enterprise app, Python’s got you covered.
Beyond Regex: Defensive Validation
While regex is great for quick URL checks, rigorous validation often means adding more logic. Consider these defensive steps:
Trim whitespace before validation—accidental spaces cause most validators to reject otherwise valid URLs.
Check for empty input and enforce reasonable length limits (e.g., 2048 characters).
Validate scheme: Only allow
http
,https
, or your required protocols.Domain verification: Use regex or libraries to ensure the domain is well-formed.
Here’s an example of thorough validation logic:
import re import urllib.parse SCHEME_FORMAT = r"https?ftp" DOMAIN_FORMAT = r"[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" def validate_url(url: str): url = url.strip() if not url: raise Exception("No URL specified") if len(url) > 2048: raise Exception(f"URL exceeds its maximum length of 2048 characters (given length={len(url)})") result = urllib.parse.urlparse(url) scheme = result.scheme domain = result.netloc if not scheme: raise Exception("No URL scheme specified") if not re.fullmatch(SCHEME_FORMAT, scheme): raise Exception(f"URL scheme must either be http(s) or ftp(s) (given scheme={scheme})") if not domain: raise Exception("No URL domain specified") if not re.fullmatch(DOMAIN_FORMAT, domain): raise Exception(f"URL domain malformed (domain={domain})") return url
Alternative: Using Validation Libraries
While regex lets you roll your own, Python has powerful validation libraries that take care of edge cases and oddities—saving you time and reducing bugs.
Using the validators
Package
This package provides simple functions for validating URLs (and other data like emails and IP addresses):
import validators print(validators.url("http://localhost:8000")) # True print(validators.url("ftp://invalid.com")) # ValidationFailure (evaluates to False)
For more robust code, wrap the check so it always returns a boolean:
import validators from validators import ValidationFailure def is_string_an_url(url_string: str) -> bool: # Always strip whitespace before validating! Result = validators.url(url_string.strip()) return result is True # Only True is valid; ValidationFailure is falsy
Examples:
print(is_string_an_url("http://localhost:8000")) # True print(is_string_an_url("http://.www.foo.bar/")) # False print(is_string_an_url("http://localhost:8000 ")) # True (after .strip())
Tip: Always trim leading and trailing spaces before validating URLs, as even a single space will cause most validators—including regex and libraries—to reject the input.
Other Approaches: RFC-Based Validation
If you want to validate URLs according to official standards, look into tools that implement , which defines recommendations for validating HTTP URLs and email addresses. For instance, you can use parser libraries (such as LEPL, though it’s no longer maintained) that follow these recommendations for higher accuracy in tricky cases.
A typical workflow with a parser library might look like this:
from lepl.apps.rfc3696 import HttpUrl validator = HttpUrl() print(validator('google')) # False print(validator('http://google')) # False print(validator('http://google.com')) # True
While LEPL is archived, the above pattern shows how you might leverage standards-based parsing for edge cases that regex or general-purpose validators can miss. For modern projects, stick with maintained libraries, but it’s helpful to know these standards exist if you ever need to write your own validator or debug why something isn’t matching.
Validation Using Django’s URLValidator
If you’re already using Django, leverage its built-in URL validator for comprehensive checks:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_string_an_url(url_string: str) -> bool: validate_url = URLValidator() try: validate_url(url_string.strip()) return True except ValidationError: return False
Examples:
print(is_string_an_url("https://example.com")) # True print(is_string_an_url("not a url")) # False
Adding Django just for its URL validation is probably overkill, but if you’re in a Django project already, this is one of the most reliable approaches.
Bonus: Pydantic for Structured Validation
If you’re working with data models or APIs, Pydantic provides another robust way to validate URLs (and more) using Python type hints and schema validation. It’s especially handy when you want validation and structured error handling as part of your model definitions.
from requests import get, HTTPError, ConnectionError from pydantic import BaseModel, AnyHttpUrl, ValidationError class MyConfModel(BaseModel): URI: AnyHttpUrl try: myAddress = MyConfModel(URI="http://myurl.com/") req = get(myAddress.URI, verify=False) print(myAddress.URI) except ValidationError: print('Invalid destination')
Pydantic’s AnyHttpUrl
will catch invalid URLs and raise a ValidationError
. This is useful for ensuring that configuration, user input, or API parameters are valid before making requests or processing data.
Tested Patterns
Pydantic’s built-in validators are quite thorough. For example, the following URLs pass:
http://localhost
http://localhost:8080
http://example.com
http://user:password@example.com
http://_example.com
But these will fail validation:
http://&example.com
http://-example.com
If you need structured validation and meaningful error handling—especially in data models—Pydantic is a great addition to your toolkit.
Practical Testing and Edge Cases
Testing matters! Don’t forget to write cases for empty URLs, missing schemes, malformed domains, and subtle variants:
import pytest def test_empty_url(): with pytest.raises(Exception, match="No URL specified"): validate_url("") def test_missing_scheme(): with pytest.raises(Exception, match="No URL scheme specified"): validate_url("example.com") def test_malformed_domain(): with pytest.raises(Exception, match="URL domain malformed"): validate_url("http://.bad_domain")
Testing both the positive and negative cases ensures your validator does exactly what you expect—no more, no less.
Why Trim Whitespace Before URL Validation?
Before validating a URL, it’s essential to remove any leading or trailing spaces from the string. Even an extra space at the start or end—something easy to miss when copying and pasting—will cause most validation methods, including Python’s strict regex patterns, to treat the URL as invalid.
For example, "http://localhost:8000 "
(with a trailing space) will fail validation, even though the actual URL is fine. By using Python’s strip()
method, you ensure you’re testing the true URL as intended:
url = "http://localhost:8000 " is_valid = is_string_an_url(url.strip()) # Returns True
Trimming whitespace helps your validations stay reliable, prevents false negatives, and ensures your applications don’t accidentally reject legitimate URLs due to minor copy-paste issues.
With these approaches—regex for quick checks, and libraries for thorough validation—you can confidently handle URL validation in a variety of Python projects.
A More Robust Solution: Comprehensive URL Validation
While the above regex patterns cover most everyday use-cases, URLs in the wild can be quite unpredictable. For bulletproof validation—recognizing everything from localhost
to exotic internationalized domains, and robustly excluding invalid edge cases—you may want something more thorough.
Here's a regex pattern that takes into account:
Protocols: Supports
http
,https
,ftp
,rtsp
,rtp
, andmmp
Authentication: Handles optional
user:pass@
credentialsIP Addresses: Accepts public IPs, rejects private/local addresses (e.g.,
127.0.0.1
,192.168.x.x
)Hostnames & International Domains: Supports Unicode characters and punycode
Ports: Optional, supports typical port ranges
Paths & Queries: Optional, matches any valid path, query string, or fragment
import re ip_middle_octet = r"(?:\.(?:1?\d{1,2}2[0-4]\d25[0-5]))" ip_last_octet = r"(?:\.(?:[1-9]\d?1\d\d2[0-4]\d25[0-4]))" URL_PATTERN = re.compile( r"^" # start of string r"(?:(?:https?ftprtsprtpmmp)://)" # protocol r"(?:\S+(?::\S*)?@)?" # optional user:pass@ r"(" # host/ip group r"(?:localhost)" # localhost r"(?:(?:10127)" + ip_middle_octet + r"{2}" + ip_last_octet + r")" # 10.x.x.x, 127.x.x.x r"(?:(?:169\.254192\.168)" + ip_middle_octet + ip_last_octet + r")" # 169.254.x.x, 192.168.x.x r"(?:172\.(?:1[6-9]2\d3[0-1])" + ip_middle_octet + ip_last_octet + r")" # 172.16.x.x - 172.31.x.x r"(?:(?:[1-9]\d?1\d\d2[01]\d22[0-3])" + ip_middle_octet + r"{2}" + ip_last_octet + r")" # public IPs r"(?:(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+" r"(?:\.(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)*" r"(?:\.(?:[a-z\u00a1-\uffff]{2,})))" # domain names with TLD r")" r"(?::\d{2,5})?" # optional port r"(?:/\S*)?" # optional resource path r"(?:\?\S*)?" # optional query r"$", re.UNICODE re.IGNORECASE ) def url_validate(url): """URL string validation""" return URL_PATTERN.match(url)
Why use this?
If you're building forms or tools that need to reliably validate user-submitted URLs—including those with edge-case hostnames or public IP addresses—this pattern will catch what simpler regexes might miss. For example, it will recognize http://sub.例子.测试:8080/path?foo=bar
and reject a string like http://192.168.1.1
, which is a private IP.
Choose the right level of strictness for your needs:
For simple checks (e.g., ensuring a URL looks legit), the first few regexes are fast and easy.
If you need enterprise-grade validation or want to be sure you’re not letting through malformed or local network URLs, the comprehensive solution above is your friend.
Extending URL Validation for IPv6 Support
To make your URL validation regex compatible with IPv6 addresses, you’ll need to do two things:
Integrate a robust IPv6 validator regex (for example, from a trusted library or resource like Markus Jarderot’s pattern).
Adjust your URL parsing logic so it can recognize and accept IPv6 notation within URLs. This typically involves allowing square brackets around the IP portion (e.g.,
http://[2001:db8::1]:8080/
).
A sample step in your validation routine could look like this:
When parsing the domain or host part of the URL, check if it’s an IPv6 address using your IPv6 validator. If so, ensure it matches the expected bracketed format for URLs.
By adding these enhancements, your validator will be able to handle URLs featuring IPv6 addresses alongside standard domain names or IPv4 addresses.
Ensuring Your String Is a Single Valid URL
A common pitfall in URL validation is accidentally matching inputs like http://www.google.com/path,www.yahoo.com/path
as a single valid URL, when it's really two URLs separated by a comma. To prevent this and ensure your string is exactly one, clean, valid URL, follow these tips:
Anchor the regex: Always use
^
(start) and$
(end) in your pattern. This way, only a string that is a single URL—nothing more, nothing less—will be accepted.Avoid matching delimiters: Do not allow characters such as commas or spaces after (or before) the URL in your regex.
No partial matches: Use the
fullmatch()
method rather thanmatch()
orsearch()
. It checks if the whole string matches your pattern—not just a part of it.
Here's how your validation logic should look in Python:
import re def is_strict_single_url(url): # Regex allows http/https, domains, subdomains, and optional paths/queries pattern = re.compile(r'^(httphttps):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[a-zA-Z0-9\-._~:/?#[\]@!$&\'()*+,;=]*)?$') return bool(pattern.fullmatch(url))
By using pattern.fullmatch(url)
, any extra commas, whitespace, or multiple URLs in the string will cause validation to fail—ensuring only single, proper URLs get through.
Enforcing a Maximum URL Length
To make sure your URLs don’t sneak past a set maximum length—commonly 2048 characters—you can add a simple length check before validating the rest of the URL. This is useful for keeping your forms and applications safe from overly long or potentially malicious links.
Here’s what you can do:
Trim whitespace from the input to avoid counting accidental spaces.
Check the length of the URL string.
Raise an error or reject the URL if it’s too long.
For example, before running your usual regex or validation logic:
MAX_URL_LENGTH = 2048 url = url.strip() if len(url) > MAX_URL_LENGTH: raise ValueError(f"URL exceeds the maximum length of {MAX_URL_LENGTH} characters (got {len(url)})") # Proceed with your normal URL validation checks here
This way, you immediately filter out any URLs that overshoot your preferred limit, keeping your processing tight and controlled. In most web and API environments, 2048 characters is a practical upper bound—used by browsers like Chrome and tools such as Postman—so it’s a solid default.
Python Example Code
import re def is_valid_url(url): pattern = re.compile(r'^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[a-zA-Z0-9\-._~:/?#[\]@!$&\'()*+,;=]*)?$') return bool(pattern.fullmatch(url)) # Test URLs print(is_valid_url("https://qodex.ai")) # True print(is_valid_url("http://example.com/path")) # True print(is_valid_url("ftp://invalid.com")) # False
Try variations using the Python Regex Tester.
How to Split and Validate a URL Using urllib.parse
and Regex
To thoroughly validate a URL in Python, you'll often need to do more than just match the full string with a single regex. Here's a flexible approach combining Python’s urllib.parse
with targeted regular expressions for each URL component:
Break Down the URL:
Useurllib.parse.urlparse()
to split your URL into its core parts:Scheme (
http
,https
)Netloc (domain, subdomain, and optional port)
Path, query, fragment, etc.
Validate Each Piece:
Apply regular expressions to the components that matter most for your use case:Scheme: Ensure it’s
http
orhttps
.Netloc: Confirm it’s a valid domain name or IP address, and optionally check for a port (e.g.,
example.com:8080
).Path: If needed, add checks for valid characters in the path segment.
IP Address Support:
If your URLs might contain IP addresses instead of domain names, include regex patterns capable of matching IPv4 addresses. For IPv6 support, use a specialized IPv6 validator—such as Markus Jarderot’s widely regarded regex—for robust parsing.Example Workflow:
Parse the URL:
from urllib.parse import urlparse import re url = "https://127.0.0.1:5000/home" parsed = urlparse(url)
Validate scheme:
if parsed.scheme not in ["http", "https"]: # Handle invalid scheme
Validate netloc (domain or IP, with optional port):
domain_pattern = r"^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" ipv4_pattern = r"^\d{1,3}(\.\d{1,3}){3}$" port_pattern = r":\d{2,5}$" netloc = parsed.netloc.split(':')[0] # Extract domain/IP
For IPv6, integrate a dedicated validation function to accurately detect and confirm legitimate IPv6 addresses.
This modular technique gives you fine-grained control: you can adapt the regex and logic to your specific form, crawler, or parser requirements. It’s a great way to manage tricky edge cases that simple string-wide regex approaches might miss.
Detecting Potential URLs Beyond Strict Patterns
Sometimes, you may need to recognize tokens that could be URLs—even if they're not perfectly formed. For instance, you might encounter strings like yandex.ru.com/somepath
or www.example.org
, which don’t always match the strictest regex but still represent URLs in practice.
To address this, consider checking two things:
Does the text start with common URL schemes or prefixes (like
http
,www
, orftp
)?Does it end with a valid public domain suffix?
Here's a practical Python example that fetches an up-to-date list of public domain suffixes and uses them to identify likely URLs:
import requests def get_domain_suffixes(): res = requests.get('https://publicsuffix.org/list/public_suffix_list.dat') suffixes = set() for line in res.text.split(' '): if not line.startswith('//'): domains = line.split('.') cand = domains[-1] if cand: suffixes.add('.' + cand) return tuple(sorted(suffixes)) domain_suffixes = get_domain_suffixes() def reminds_url(txt: str): """ Returns True if the text looks like a URL. Example: >>> reminds_url('yandex.ru.com/somepath') True """ ltext = txt.lower().split('/')[0] return ltext.startswith(('http', 'www', 'ftp')) or ltext.endswith(domain_suffixes)
This approach is especially useful for quick validation or preprocessing, where you want to capture URLs even if they're missing a protocol or have unusual structures.
Handling Python 2 and Python 3 for URL Validation
Python’s urlparse
module is a handy way to validate URLs, but the import path changes between Python 2 and Python 3. Here’s how to ensure compatibility and robust URL checking across both versions.
Cross-Version Import
Depending on your environment, you’ll want to handle the import gracefully:
try: # For Python 2 from urlparse import urlparse except ImportError: # For Python 3 from urllib.parse import urlparse
Example Function for URL Validation
After importing urlparse
, you can create a simple validator function:
def is_valid_url(url): try: result = urlparse(url) return all([result.scheme, result.netloc]) except (AttributeError, TypeError): return False
This function checks for both a scheme (like http
or https
) and a network location, filtering out partial paths, numbers, and malformed strings.
Sample Usage
urls = [ "http://www.cwi.nl:80/%7Eguido/Python.html", # Valid "/data/Python.html", # Invalid (missing scheme) 532, # Invalid (not a string) u"dkakasdkjdjakdjadjfalskdjfalk", # Invalid (nonsense string) "https://qodex.ai" # Valid ] results = [is_valid_url(u) for u in urls] print(results) # Output: [True, False, False, False, True]
This approach keeps your validation logic compatible regardless of whether you're running Python 2 or Python 3. And of course, it’s a good companion to using regex for more nuanced rules.
Alternative: Using Validation Libraries
While regex is great for quick URL checks, Python has some powerful validation libraries that can save you time and headaches, especially when edge cases start popping up.
Using the Package
The package provides simple functions for validating URLs (and many other types of data like emails and IP addresses). Here’s how you can use it:
import validators print(validators.url("http://localhost:8000")) # True print(validators.url("ftp://invalid.com")) # ValidationFailure object (evaluates to False)
For more robust code, consider wrapping this check to always return a boolean:
import validators from validators import ValidationFailure def is_string_an_url(url_string: str) -> bool: # Always strip whitespace before validating! result = validators.url(url_string.strip()) return result is True # Only True is valid; ValidationFailure is falsy
Examples
print(is_string_an_url("http://localhost:8000")) # True print(is_string_an_url("http://.www.foo.bar/")) # False print(is_string_an_url("http://localhost:8000 ")) # True (after .strip())
Tip: Always trim leading and trailing spaces before validating URLs, as even a single space will cause most validators—including regex and libraries like —to reject the input.
Validation Using Django’s URLValidator
If you’re already using Django, leverage its built-in URL validator for comprehensive checks:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_string_an_url(url_string: str) -> bool: validate_url = URLValidator() try: validate_url(url_string.strip()) return True except ValidationError: return False
Examples
print(is_string_an_url("https://example.com")) # True print(is_string_an_url("not a url")) # False
Adding Django just for its URL validation is probably overkill, but if you’re in a Django project already, this is one of the most reliable approaches.
With these approaches—regex for quick checks, and libraries for thorough validation—you can confidently handle URL validation in a variety of Python projects.
Validating URLs with Pydantic
Another slick option for URL validation in Python is the Pydantic library. While it’s most famous for parsing and validating data for FastAPI and configuration models, Pydantic actually provides a robust set of URL data types out of the box.
Pydantic’s URL Types: A Quick Overview
Pydantic comes with several helpful field types—perfect if you need more specificity than just “any old URL.” For example:
AnyUrl
: Accepts nearly all valid URLs, including custom schemes.AnyHttpUrl
: Restricts to HTTP and HTTPS URLs.HttpUrl
: Demands HTTP/HTTPS, includes checks for host and TLD.FileUrl
,PostgresDsn
, etc.: Specialized for files or specific database connections.
Refer to the documentation for a full list of options and scheme support.
How to Use with a Minimal Example
Here’s a typical usage pattern with Pydantic:
from pydantic import BaseModel, AnyHttpUrl, ValidationError class Config(BaseModel): endpoint: AnyHttpUrl # or choose the URL type you need try: conf = Config(endpoint="http://localhost:8080") print(conf.endpoint) # Will print a validated URL except ValidationError: print("Not a valid HTTP(s) URL")
Attempting to create a model with an invalid URL will raise a
ValidationError
you can catch to handle input errors gracefully.Pydantic also helps clarify why a value is invalid in its error messages.
Limitations and Gotchas
While Pydantic’s validators are thorough, keep in mind:
Some schemes (like
ftp
or database DSNs) requireAnyUrl
or more specific types (likePostgresDsn
).The strictness of validation depends on which field type you pick.
Leading/trailing spaces should be trimmed before assignment (Pydantic will usually do this, but don’t rely on it for noisy or poorly sanitized input).
Sample URLs and Outcomes
Here’s a taste of how Pydantic’s AnyHttpUrl
responds:
"http://localhost"
– valid"http://localhost:8080"
– valid"http://user:password@example.com"
– valid"http://_example.com"
– valid (underscore accepted)"http://&example.com"
– invalid (symbol not allowed)"http://-example.com"
– invalid (hyphen at start is rejected)
For comprehensive URL checks, Pydantic combines convenience with clarity—making your data models safer with minimal effort.
Checking the Latest Public Suffixes for Domain Validation
Sometimes, validating a URL or domain isn’t just about confirming the syntax—especially if you want to ensure your code recognizes valid top-level domains (TLDs) and public suffixes. To stay current with domain extensions (including newer ones like .dev
, .app
, or .io
), you can programmatically retrieve the official public suffix list maintained by Mozilla.
Here’s a simple Python approach that pulls the latest list directly from publicsuffix.org and extracts all recognized domain suffixes:
import requests def fetch_public_suffixes(): response = requests.get('https://publicsuffix.org/list/public_suffix_list.dat') suffixes = set() for line in response.text.splitlines(): line = line.strip() # Skip comments and empty lines if line and not line.startswith('//'): suffixes.add('.' + line) return tuple(sorted(suffixes)) # Fetch the latest suffixes domain_suffixes = fetch_public_suffixes()
What this does:
Downloads the current public suffix list.
Ignores comments and empty lines in the dataset.
Collects each suffix into a tuple for easy lookups.
This technique helps ensure your domain validation logic is aware of every TLD currently recognized by major browsers and libraries—so you’re not blindsided by new suffixes.
Use this in your URL or email checker to make your validations future-proof and standards-compliant.
When Should You Do a DNS Check?
A quick note for thoroughness: validating a URL's format—whether using regex, the validators
package, or Django’s built-in tools—only ensures the string looks like a URL. It doesn’t tell you whether that URL actually exists or leads to a live destination.
That’s where DNS checks come in. If you truly need to confirm that a URL points to a real, resolvable domain (e.g., verifying "https://www.google" isn’t just well-formed, but actually goes somewhere), you’ll need to go a step further by performing a DNS lookup. This process asks, "Does this domain exist on the internet right now?"—something no regex or typical package will answer for you.
DNS checks aren’t always necessary for basic validation tasks like form inputs or static checks. But, if you’re building anything that relies on external connectivity (think: crawlers, link checkers, or automated testing tools), adding a DNS resolution step is a good way to catch invalid or unavailable domains before they cause trouble later in your workflow.
Using the Package for URL
If you prefer not to write your own regex, you can easily check if a string is a valid URL by using the popular validators Python package. Here’s a straightforward approach:
import validators def is_valid_url(url_string): result = validators.url(url_string) return result is True # Returns True if valid, False otherwise # Example usage print(is_valid_url("http://localhost:8000")) # True print(is_valid_url("http://.www.foo.bar/")) # False
The function
is_valid_url
returnsTrue
only when the provided string passes all URL checks performed by the library.Internally,
validators.url()
returnsTrue
when valid, or aValidationFailure
object when not—so this function keeps things simple.
Use this for quick, robust validation without wrangling regex patterns.
Quick tip: Always strip spaces before validation, especially if the URL is coming from user input or copy-paste operations.
This approach is efficient, readable, and saves you from reinventing the wheel when working with URLs in Python.
Using Python Standard Library for URL Validation
Prefer to stick with the standard library? You can use urlparse
(available via urllib.parse
in Python 3 and urlparse
in Python 2) to check whether a string is structured like a URL—without installing any third-party libraries.
Here's a basic approach:
try: # Python 2 from urlparse import urlparse except ImportError: # Python 3 from urllib.parse import urlparse def is_valid_url(string): try: result = urlparse(str(string).strip()) # Validation: must have scheme (like http/https) and netloc (domain) return all([result.scheme, result.netloc]) except Exception: return False
Examples:
print(is_valid_url('http://www.python.org')) # True print(is_valid_url('/just/a/path/file.txt')) # False print(is_valid_url(12345)) # False print(is_valid_url('not a url at all')) # False print(is_valid_url('https://github.com')) # True
Note:
URL parsing checks structure, not whether the URL is actually reachable on the internet. For more extensive validation (including syntax and even DNS lookups), consider using packages like validators
or requests
. But for basic checks, urlparse
fits the bill.
Making Validated URL Objects Act Like Strings
Say you’ve wrapped URL validation inside a custom class—how do you make sure your objects still behave like regular strings throughout your codebase? It’s simple: just ensure your class inherits from str
directly, or implements the required string methods. This way, once a URL has passed your checks, you can use it anywhere a string is expected.
For example:
class ReachableURL(str): def __new__(cls, url): # Validate the URL here... # (Assume validation passes for this example) return str.__new__(cls, url)
Now, instances of ReachableURL
can be used seamlessly—just like ordinary strings:
url_instance = ReachableURL("http://example.com") print(isinstance(url_instance, str)) # True print(url_instance.upper()) # HTTP://EXAMPLE.COM
This approach lets you layer extra functionality (like validation or reachability checks) while retaining all the familiar power of Python’s string operations. So, whether you’re concatenating, slicing, or handing off URLs to other libraries, you’ll keep everything as clean and Pythonic as possible.
Using Django’s URLValidator to Check URLs in Python
Django comes with a handy built-in tool for validating URLs: the URLValidator
from the django.core.validators
module. This validator is designed to determine whether a given string matches the criteria for a valid web address. If you’re already using Django in your project, it’s a convenient and reliable approach for URL validation.
Here’s how you can use it:
Import the Necessary Classes:
URLValidator
for the actual validationValidationError
to handle invalid cases
Write a Simple Validation Function:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_valid_url(url: str) -> bool: validator = URLValidator() try: validator(url) return True except ValidationError: return False
When you call
validator(url)
, it checks if theurl
string adheres to standard URL patterns.If the supplied value isn’t a valid URL, it raises a
ValidationError
. The function returnsTrue
for valid URLs andFalse
for invalid ones.
While Django’s URLValidator
is powerful, keep in mind that adding Django as a dependency may be unnecessary for lightweight projects. However, for those already using Django, it’s a robust option for all your URL validation needs.
How validators.url Works in Python
If you're looking for a quick, reliable way to check whether a string is a valid URL in Python, the validators
package is a handy tool. Its url
function makes URL validation straightforward—even for tricky cases.
How It Validates
Pass your URL as a string to
validators.url()
.If the input is a valid URL, you'll get
True
as the result.If the URL is not valid, instead of a simple
False
, it returns an object calledValidationFailure
. While this might feel a bit unexpected, it still makes it easy to know whether your URL passes or fails validation.
Example Usage
Here’s what a typical validation flow might look like:
import validators from validators import ValidationFailure def is_string_a_url(candidate: str) -> bool: result = validators.url(candidate) return False if isinstance(result, ValidationFailure) else result # Checking results print(is_string_a_url("http://localhost:8000")) # Outputs: True print(is_string_a_url("http://.www.foo.bar/")) # Outputs: False
This approach ensures you're only working with recognized, well-formed URLs—perfect for situations where data quality matters most.
Enforcing URL Validation with Python Classes and Type Checking
If you want to enforce URL validation at a deeper level across your codebase—not just at input time or form submission—you can use Python’s class inheritance and type checking. By encapsulating validation logic inside custom types, you make it much harder for invalid URLs to sneak into your application logic or data models.
Example: Creating a URL Type with Built-in Validation
You can define a custom string subclass that validates its input every time it’s instantiated. This approach leverages Python’s rich data model and is especially useful if you’re working in larger codebases, or when you want your function signatures and type hints to truly mean “URL—not just any string!”
Here’s a typical pattern using standard library features and popular utilities:
from urllib.parse import urlparse class URL(str): def __new__(cls, value: str): result = urlparse(value) # Only allow non-empty scheme and netloc (host/address) if not (result.scheme and result.netloc): raise ValueError(f"Invalid URL: {value!r}") return str.__new__(cls, value)
Usage Example:
site = URL("https://wikipedia.org") # works fine another = URL("not a url") # raises ValueError
Any attempt to create a URL
object with an invalid address immediately results in an error, so only valid URLs can be used downstream.
Benefits of Using Custom Types
Early Validation: Problems surface instantly at the object creation stage.
Type Safety: Your IDE and static analysis tools (e.g., mypy) can help catch mistakes when you annotate with your custom
URL
type.Cleaner Code: Functions and classes that require URLs can explicitly declare so, boosting readability and reducing runtime surprises.
Extending Functionality
For stricter checks—such as ensuring the URL is reachable, uses HTTPS, or isn’t a localhost address—simply extend your base URL
class and add additional validation in __new__
.
import socket class ReachableURL(URL): def __new__(cls, value: str): instance = super().__new__(cls, value) hostname = urlparse(instance).hostname if not hostname: raise ValueError(f"Invalid URL: {value!r}") try: socket.gethostbyname(hostname) except socket.error: raise ValueError(f"Hostname not resolvable: {hostname}") return instance
With this approach, your URL handling code stays explicit, self-documenting, and robust—whether you’re writing a web crawler, building APIs with FastAPI or Django, or just aiming for cleaner domain models.
Django’s URL Validator vs. Standalone Packages
If you're considering URL validation for your project, you might wonder whether to rely on Django’s built-in utility or opt for a lightweight standalone package. Here’s a quick breakdown to help you weigh the options:
Advantages of Django’s URL Validator:
Comprehensive Checks: Django’s validator is well-tested, supports standard URL patterns, and even has an option to check if the URL actually exists (
verify_exists
).Integration: Seamlessly fits within the rest of Django’s validation ecosystem, making it perfect for projects already using Django for forms or models.
Community and Documentation: You benefit from a large, active community and thorough documentation—making it easier to troubleshoot or extend.
Drawbacks to Consider:
Dependency Bloat: Including Django just for URL validation can be overkill—Django is a robust, full-featured framework, and significantly increases your project’s size and dependencies if you’re not already using it.
Complexity: For smaller scripts, microservices, or non-Django projects, a standalone library (such as
validators
or simple regex-based checks) will keep things lean and more easily maintained.Performance: Extra dependencies sometimes add startup time and potential version conflicts, especially in minimalist environments.
Summary:
If your stack already uses Django, leveraging its URL validator is a solid and hassle-free choice. For lightweight projects or scripts, standalone validation packages or tailored regex rules will keep your footprint minimal and setup easier. Choose based on your project’s needs and existing tech stack!
Why Trim Whitespace Before URL Validation?
Before validating a URL, it’s essential to remove any leading or trailing spaces from the string. Even an extra space at the start or end—something easy to miss when copying and pasting—will cause most validation methods, including Python’s strict regex patterns, to treat the URL as invalid.
For example, "http://localhost:8000 "
(with a trailing space) will fail validation, even though the actual URL is fine. By using Python’s strip()
method, you ensure you’re testing the true URL as intended:
url = "http://localhost:8000 " is_valid = is_string_an_url(url.strip()) # Returns True
Trimming whitespace helps your validations stay reliable, prevents false negatives, and ensures your applications don’t accidentally reject legitimate URLs due to minor copy-paste issues.
Use Cases
Form Validation: Ensure users submit well-structured URLs in web forms.
Data Cleaning: Remove or fix malformed links in large datasets.
Crawlers & Scrapers: Verify URLs before crawling or scraping content.
Security Filtering: Block suspicious or malformed URLs from being stored or executed.
Useful tools:
IP Address Regex Python Validator – for network-based URL checks
Email Regex Python Validator – to validate contact forms
Password Regex Python Validator – when securing user input alongside URLs
Categorized Regex Metacharacters
^
: Matches the start of the string$
: Matches the end of the string.
: Matches any character (except newline)+
: Matches one or more of the previous token*
: Matches zero or more of the previous token?
: Makes the preceding token optional[]
: Matches any one character inside the brackets()
: Groups patterns|
: OR operator\:
: Escapes special characters like ":"
Pro Tips
Always use raw strings (r'') in Python to avoid escaping issues.
Add anchors ^ and $ to match the full URL and avoid partial matches.
Use non-capturing groups (?:...) for cleaner matching if needed.
Test localhost or custom ports using a regex like: localhost:\d{2,5}
Combine this validator with IP Address Regex Python Validator for APIs or internal tools.
More on Domain and Port Validation
When validating URLs, it's important to remember that the regex should handle both the scheme (like http
or https
) and the domain (or netloc) parts of the URL. The domain section includes everything up to the first slash /
, so port numbers (like :8000
) are safely included in this part of the match.
For example:
This approach ensures that your validator can match standard domains, custom ports, and even IPv4 addresses.
Supporting IPv6 Addresses
If you need to validate URLs containing IPv6 addresses, consider enhancing your regex or integrating a specialized IPv6 validator. A comprehensive IPv6 regex will handle the full range of valid address formats, so incorporate a solution like Markus Jarderot's IPv6 validator for best results. Remember to check both the domain and the IP format when validating.
Examples in Action
IPv4 and Alphanumeric Domains:
Use a regex that matches standard domains and IPv4 addresses. For reference, tools like can help you test and refine your patterns.IPv6 Support:
With the right regex, you can capture URLs using IPv6 addresses, ensuring your validation routine is robust for any environment—including internal networks or modern APIs.
By combining these approaches, your URL validation will be flexible enough for everything from localhost development to production-grade, multi-protocol endpoints.
Frequently asked questions
Discover, Test, and Secure your APIs — 10x Faster.

Product
All Rights Reserved.
Copyright © 2025 Qodex
Discover, Test, and Secure your APIs — 10x Faster.

Product
All Rights Reserved.
Copyright © 2025 Qodex
URL Regex Python Validator
Search...
⌘K
URL Regex Python Validator
Search...
⌘K


URL Regex Python Validator
URL Regex Python Validator
Use the URL Regex Python Validator to accurately test patterns for validating website links in Python. Whether you’re checking for http, https, or complex paths, this tool helps ensure your URLs are clean, correct, and reliable. For more regex testing, explore the Python Regex Tester, Email Regex Python Validator, or IP Address Regex Python Validator.
[A-Z]
: uppercase letters[a-z]
: lowercase letters[0-9]
: digits\.
: a literal dot+
: one or more of the preceding*
: zero or more of the preceding?
: optional (zero or one)^
: start of string$
: end of string
Test your APIs today!
Write in plain English — Qodex turns it into secure, ready-to-run tests.
URL Regex Python Validator - Documentation
What is the URL Regex Python Validator?
The URL Regex Python Validator is designed to help you check whether your regular expressions correctly match valid web addresses. This includes checking for:
Protocols like http or https
Domain names and subdomains
Optional ports, paths, query parameters, and fragments
It uses Python’s re module and is ideal for form validation, web crawling, data parsing, and link-checking tasks.
Common URL Regex Patterns
Basic HTTP/HTTPS URL
^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Matches: http://example.com, https://qodex.ai
Does not match: example.com, ftp://server.com
Full URL with Optional Paths & Queries
^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[a-zA-Z0-9\-._~:/?#[\]@!$&'()*+,;=]*)?$
Matches: https://site.com/path?search=value, http://domain.org
With Optional Port
^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(:\d{2,5})?(\/.*)?$
Matches: http://localhost:8000/home, https://api.site.com:443/v1
What Does the Regex Actually Check?
HTTP Basic Authentication: Optionally allows for before the domain.
Domain Structure: Enforces at least one subdomain (e.g., ), accepts dashes within subdomains, and ensures each subdomain and the top-level domain don't exceed length limits (max 63 characters each, full domain under 253).
Top-Level Domain: Only allows alphanumeric characters (no dashes).
Localhost Support: Accepts as a valid domain.
Port Numbers: Optionally matches ports up to 5 digits (e.g., ).
IPv4 Addresses: Recognizes standard IPv4 addresses in the netloc.
IPv6 Addresses: For IPv6 validation, you’ll want to supplement with a dedicated IPv6 validator, as the regex alone may not cover all edge cases.
Handling Complex and Edge Cases
While the above regex patterns cover most use-cases, URLs in the wild can be tricky—especially with top-level domains like .co.uk
or unconventional subdomain structures. If you need a more robust solution that accounts for these "weird cases," consider a regex pattern that also allows for:
Optional protocol (e.g.,
http://
,https://
, or none)Optional subdomains (like
www.
)Support for multi-part TLDs (e.g.,
co.uk
)Paths, query strings, and fragments
Hyphens in domain names
Example Enhanced Regex (Python-style)
regex = re.compile( r"(\w+://)?" # protocol (optional) r"(\w+\.)?" # subdomain (optional) r"(([\w-]+)\.(\w+))" # domain r"(\.\w+)*" # additional TLD parts (optional) r"([\w\-\.\_\~/]*)" # path, query, fragments (optional) )
Test Cases for Thoroughness
This more flexible approach will match a variety of real-world URLs, such as:
http://www.google.com
https://google.co.uk
google.com/~user/profile
www.example.org
https://sub.domain.co.uk/path/to/page
example.com
.google.com
(edge case, may require post-processing)
By testing against a broad set of examples—including those with extra dots, missing subdomains, or unusual TLDs—you can ensure your regex is both comprehensive and resilient.
Feel free to adjust the patterns and test cases as needed to suit the specific requirements of your application. Regex is powerful, but always test thoroughly to avoid surprises!
Advanced Regex for Edge Cases
A more thorough regex can handle authentication, IPv4/IPv6 addresses, localhost, port numbers, and more:
import re DOMAIN_FORMAT = re.compile( r"(?:^(\w{1,255}):(.{1,255})@^)" # http basic authentication [optional] r"(?:(?:(?=\S{0,253}(?:$:)))" r"((?:?\.)+" # subdomains r"(?:[a-z0-9]{1,63})))" # top level domain r"localhost" r")(:\d{1,5})?", # port [optional], re.IGNORECASE ) SCHEME_FORMAT = re.compile( r"^(httphxxpftpfxp)s?$", # scheme: http(s) or ftp(s) re.IGNORECASE ) from urllib.parse import urlparse def validate_url(url: str): url = url.strip() if not url: raise Exception("No URL specified") if len(url) > 2048: raise Exception(f"URL exceeds its maximum length of 2048 characters (given length={len(url)})") result = urlparse(url) scheme = result.scheme domain = result.netloc if not scheme: raise Exception("No URL scheme specified") if not re.fullmatch(SCHEME_FORMAT, scheme): raise Exception(f"URL scheme must either be http(s) or ftp(s) (given scheme={scheme})") if not domain: raise Exception("No URL domain specified") if not re.fullmatch(DOMAIN_FORMAT, domain): raise Exception(f"URL domain malformed (domain={domain})") return url
This approach splits the URL and validates the scheme and domain separately, handling a wider array of valid URLs (including those with authentication and ports). For even greater accuracy (such as validating IPv6), you might want to add an IPv6 validator.
Alternative: Using Validation Libraries
While regex is great for quick URL checks, Python has some powerful validation libraries that can save you time and headaches, especially when edge cases start popping up.
Using the Package
The package provides simple functions for validating URLs (and many other types of data like emails and IP addresses). Here’s how you can use it:
import validators print(validators.url("http://localhost:8000")) # True print(validators.url("ftp://invalid.com")) # ValidationFailure object (evaluates to False)
For more robust code, consider wrapping this check to always return a boolean:
import validators from validators import ValidationFailure def is_string_an_url(url_string: str) -> bool: # Always strip whitespace before validating! Result = validators.url(url_string.strip()) return result is True # Only True is valid; ValidationFailure is falsy
Examples
print(is_string_an_url("http://localhost:8000")) # True print(is_string_an_url("http://.www.foo.bar/")) # False print(is_string_an_url("http://localhost:8000 ")) # True (after .strip())
Tip: Always trim leading and trailing spaces before validating URLs, as even a single space will cause most validators—including regex and libraries like these—to reject the input.
Even More Powerful: Pydantic and Django
If you’re using frameworks like Pydantic or Django, you get validation utilities that handle a lot of this for you:
Validation Using Django’s URLValidator
If you’re already using Django, leverage its built-in URL validator for comprehensive checks:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_string_an_url(url_string: str) -> bool: validate_url = URLValidator() try: validate_url(url_string.strip()) return True except ValidationError: return False
Examples
print(is_string_an_url("https://example.com")) # True print(is_string_an_url("not a url")) # False
Adding Django just for its URL validation is probably overkill, but if you’re in a Django project already, this is one of the most reliable approaches.
Validation Using Pydantic
Pydantic offers types like AnyHttpUrl
for strict URL validation:
from pydantic import BaseModel, AnyHttpUrl, ValidationError class MyConfModel(BaseModel): URI: AnyHttpUrl try: myAddress = MyConfModel(URI="http://myurl.com/") print(myAddress.URI) except ValidationError: print('Invalid destination')
This approach raises exceptions for invalid URLs and supports a variety of URL types.
With these approaches—regex for quick checks, and libraries for thorough validation—you can confidently handle URL validation in a variety of Python projects. Whether you want something lightweight for a script or full-featured for an enterprise app, Python’s got you covered.
Beyond Regex: Defensive Validation
While regex is great for quick URL checks, rigorous validation often means adding more logic. Consider these defensive steps:
Trim whitespace before validation—accidental spaces cause most validators to reject otherwise valid URLs.
Check for empty input and enforce reasonable length limits (e.g., 2048 characters).
Validate scheme: Only allow
http
,https
, or your required protocols.Domain verification: Use regex or libraries to ensure the domain is well-formed.
Here’s an example of thorough validation logic:
import re import urllib.parse SCHEME_FORMAT = r"https?ftp" DOMAIN_FORMAT = r"[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" def validate_url(url: str): url = url.strip() if not url: raise Exception("No URL specified") if len(url) > 2048: raise Exception(f"URL exceeds its maximum length of 2048 characters (given length={len(url)})") result = urllib.parse.urlparse(url) scheme = result.scheme domain = result.netloc if not scheme: raise Exception("No URL scheme specified") if not re.fullmatch(SCHEME_FORMAT, scheme): raise Exception(f"URL scheme must either be http(s) or ftp(s) (given scheme={scheme})") if not domain: raise Exception("No URL domain specified") if not re.fullmatch(DOMAIN_FORMAT, domain): raise Exception(f"URL domain malformed (domain={domain})") return url
Alternative: Using Validation Libraries
While regex lets you roll your own, Python has powerful validation libraries that take care of edge cases and oddities—saving you time and reducing bugs.
Using the validators
Package
This package provides simple functions for validating URLs (and other data like emails and IP addresses):
import validators print(validators.url("http://localhost:8000")) # True print(validators.url("ftp://invalid.com")) # ValidationFailure (evaluates to False)
For more robust code, wrap the check so it always returns a boolean:
import validators from validators import ValidationFailure def is_string_an_url(url_string: str) -> bool: # Always strip whitespace before validating! Result = validators.url(url_string.strip()) return result is True # Only True is valid; ValidationFailure is falsy
Examples:
print(is_string_an_url("http://localhost:8000")) # True print(is_string_an_url("http://.www.foo.bar/")) # False print(is_string_an_url("http://localhost:8000 ")) # True (after .strip())
Tip: Always trim leading and trailing spaces before validating URLs, as even a single space will cause most validators—including regex and libraries—to reject the input.
Other Approaches: RFC-Based Validation
If you want to validate URLs according to official standards, look into tools that implement , which defines recommendations for validating HTTP URLs and email addresses. For instance, you can use parser libraries (such as LEPL, though it’s no longer maintained) that follow these recommendations for higher accuracy in tricky cases.
A typical workflow with a parser library might look like this:
from lepl.apps.rfc3696 import HttpUrl validator = HttpUrl() print(validator('google')) # False print(validator('http://google')) # False print(validator('http://google.com')) # True
While LEPL is archived, the above pattern shows how you might leverage standards-based parsing for edge cases that regex or general-purpose validators can miss. For modern projects, stick with maintained libraries, but it’s helpful to know these standards exist if you ever need to write your own validator or debug why something isn’t matching.
Validation Using Django’s URLValidator
If you’re already using Django, leverage its built-in URL validator for comprehensive checks:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_string_an_url(url_string: str) -> bool: validate_url = URLValidator() try: validate_url(url_string.strip()) return True except ValidationError: return False
Examples:
print(is_string_an_url("https://example.com")) # True print(is_string_an_url("not a url")) # False
Adding Django just for its URL validation is probably overkill, but if you’re in a Django project already, this is one of the most reliable approaches.
Bonus: Pydantic for Structured Validation
If you’re working with data models or APIs, Pydantic provides another robust way to validate URLs (and more) using Python type hints and schema validation. It’s especially handy when you want validation and structured error handling as part of your model definitions.
from requests import get, HTTPError, ConnectionError from pydantic import BaseModel, AnyHttpUrl, ValidationError class MyConfModel(BaseModel): URI: AnyHttpUrl try: myAddress = MyConfModel(URI="http://myurl.com/") req = get(myAddress.URI, verify=False) print(myAddress.URI) except ValidationError: print('Invalid destination')
Pydantic’s AnyHttpUrl
will catch invalid URLs and raise a ValidationError
. This is useful for ensuring that configuration, user input, or API parameters are valid before making requests or processing data.
Tested Patterns
Pydantic’s built-in validators are quite thorough. For example, the following URLs pass:
http://localhost
http://localhost:8080
http://example.com
http://user:password@example.com
http://_example.com
But these will fail validation:
http://&example.com
http://-example.com
If you need structured validation and meaningful error handling—especially in data models—Pydantic is a great addition to your toolkit.
Practical Testing and Edge Cases
Testing matters! Don’t forget to write cases for empty URLs, missing schemes, malformed domains, and subtle variants:
import pytest def test_empty_url(): with pytest.raises(Exception, match="No URL specified"): validate_url("") def test_missing_scheme(): with pytest.raises(Exception, match="No URL scheme specified"): validate_url("example.com") def test_malformed_domain(): with pytest.raises(Exception, match="URL domain malformed"): validate_url("http://.bad_domain")
Testing both the positive and negative cases ensures your validator does exactly what you expect—no more, no less.
Why Trim Whitespace Before URL Validation?
Before validating a URL, it’s essential to remove any leading or trailing spaces from the string. Even an extra space at the start or end—something easy to miss when copying and pasting—will cause most validation methods, including Python’s strict regex patterns, to treat the URL as invalid.
For example, "http://localhost:8000 "
(with a trailing space) will fail validation, even though the actual URL is fine. By using Python’s strip()
method, you ensure you’re testing the true URL as intended:
url = "http://localhost:8000 " is_valid = is_string_an_url(url.strip()) # Returns True
Trimming whitespace helps your validations stay reliable, prevents false negatives, and ensures your applications don’t accidentally reject legitimate URLs due to minor copy-paste issues.
With these approaches—regex for quick checks, and libraries for thorough validation—you can confidently handle URL validation in a variety of Python projects.
A More Robust Solution: Comprehensive URL Validation
While the above regex patterns cover most everyday use-cases, URLs in the wild can be quite unpredictable. For bulletproof validation—recognizing everything from localhost
to exotic internationalized domains, and robustly excluding invalid edge cases—you may want something more thorough.
Here's a regex pattern that takes into account:
Protocols: Supports
http
,https
,ftp
,rtsp
,rtp
, andmmp
Authentication: Handles optional
user:pass@
credentialsIP Addresses: Accepts public IPs, rejects private/local addresses (e.g.,
127.0.0.1
,192.168.x.x
)Hostnames & International Domains: Supports Unicode characters and punycode
Ports: Optional, supports typical port ranges
Paths & Queries: Optional, matches any valid path, query string, or fragment
import re ip_middle_octet = r"(?:\.(?:1?\d{1,2}2[0-4]\d25[0-5]))" ip_last_octet = r"(?:\.(?:[1-9]\d?1\d\d2[0-4]\d25[0-4]))" URL_PATTERN = re.compile( r"^" # start of string r"(?:(?:https?ftprtsprtpmmp)://)" # protocol r"(?:\S+(?::\S*)?@)?" # optional user:pass@ r"(" # host/ip group r"(?:localhost)" # localhost r"(?:(?:10127)" + ip_middle_octet + r"{2}" + ip_last_octet + r")" # 10.x.x.x, 127.x.x.x r"(?:(?:169\.254192\.168)" + ip_middle_octet + ip_last_octet + r")" # 169.254.x.x, 192.168.x.x r"(?:172\.(?:1[6-9]2\d3[0-1])" + ip_middle_octet + ip_last_octet + r")" # 172.16.x.x - 172.31.x.x r"(?:(?:[1-9]\d?1\d\d2[01]\d22[0-3])" + ip_middle_octet + r"{2}" + ip_last_octet + r")" # public IPs r"(?:(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+" r"(?:\.(?:[a-z\u00a1-\uffff0-9_-]-?)*[a-z\u00a1-\uffff0-9_-]+)*" r"(?:\.(?:[a-z\u00a1-\uffff]{2,})))" # domain names with TLD r")" r"(?::\d{2,5})?" # optional port r"(?:/\S*)?" # optional resource path r"(?:\?\S*)?" # optional query r"$", re.UNICODE re.IGNORECASE ) def url_validate(url): """URL string validation""" return URL_PATTERN.match(url)
Why use this?
If you're building forms or tools that need to reliably validate user-submitted URLs—including those with edge-case hostnames or public IP addresses—this pattern will catch what simpler regexes might miss. For example, it will recognize http://sub.例子.测试:8080/path?foo=bar
and reject a string like http://192.168.1.1
, which is a private IP.
Choose the right level of strictness for your needs:
For simple checks (e.g., ensuring a URL looks legit), the first few regexes are fast and easy.
If you need enterprise-grade validation or want to be sure you’re not letting through malformed or local network URLs, the comprehensive solution above is your friend.
Extending URL Validation for IPv6 Support
To make your URL validation regex compatible with IPv6 addresses, you’ll need to do two things:
Integrate a robust IPv6 validator regex (for example, from a trusted library or resource like Markus Jarderot’s pattern).
Adjust your URL parsing logic so it can recognize and accept IPv6 notation within URLs. This typically involves allowing square brackets around the IP portion (e.g.,
http://[2001:db8::1]:8080/
).
A sample step in your validation routine could look like this:
When parsing the domain or host part of the URL, check if it’s an IPv6 address using your IPv6 validator. If so, ensure it matches the expected bracketed format for URLs.
By adding these enhancements, your validator will be able to handle URLs featuring IPv6 addresses alongside standard domain names or IPv4 addresses.
Ensuring Your String Is a Single Valid URL
A common pitfall in URL validation is accidentally matching inputs like http://www.google.com/path,www.yahoo.com/path
as a single valid URL, when it's really two URLs separated by a comma. To prevent this and ensure your string is exactly one, clean, valid URL, follow these tips:
Anchor the regex: Always use
^
(start) and$
(end) in your pattern. This way, only a string that is a single URL—nothing more, nothing less—will be accepted.Avoid matching delimiters: Do not allow characters such as commas or spaces after (or before) the URL in your regex.
No partial matches: Use the
fullmatch()
method rather thanmatch()
orsearch()
. It checks if the whole string matches your pattern—not just a part of it.
Here's how your validation logic should look in Python:
import re def is_strict_single_url(url): # Regex allows http/https, domains, subdomains, and optional paths/queries pattern = re.compile(r'^(httphttps):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[a-zA-Z0-9\-._~:/?#[\]@!$&\'()*+,;=]*)?$') return bool(pattern.fullmatch(url))
By using pattern.fullmatch(url)
, any extra commas, whitespace, or multiple URLs in the string will cause validation to fail—ensuring only single, proper URLs get through.
Enforcing a Maximum URL Length
To make sure your URLs don’t sneak past a set maximum length—commonly 2048 characters—you can add a simple length check before validating the rest of the URL. This is useful for keeping your forms and applications safe from overly long or potentially malicious links.
Here’s what you can do:
Trim whitespace from the input to avoid counting accidental spaces.
Check the length of the URL string.
Raise an error or reject the URL if it’s too long.
For example, before running your usual regex or validation logic:
MAX_URL_LENGTH = 2048 url = url.strip() if len(url) > MAX_URL_LENGTH: raise ValueError(f"URL exceeds the maximum length of {MAX_URL_LENGTH} characters (got {len(url)})") # Proceed with your normal URL validation checks here
This way, you immediately filter out any URLs that overshoot your preferred limit, keeping your processing tight and controlled. In most web and API environments, 2048 characters is a practical upper bound—used by browsers like Chrome and tools such as Postman—so it’s a solid default.
Python Example Code
import re def is_valid_url(url): pattern = re.compile(r'^(http|https):\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[a-zA-Z0-9\-._~:/?#[\]@!$&\'()*+,;=]*)?$') return bool(pattern.fullmatch(url)) # Test URLs print(is_valid_url("https://qodex.ai")) # True print(is_valid_url("http://example.com/path")) # True print(is_valid_url("ftp://invalid.com")) # False
Try variations using the Python Regex Tester.
How to Split and Validate a URL Using urllib.parse
and Regex
To thoroughly validate a URL in Python, you'll often need to do more than just match the full string with a single regex. Here's a flexible approach combining Python’s urllib.parse
with targeted regular expressions for each URL component:
Break Down the URL:
Useurllib.parse.urlparse()
to split your URL into its core parts:Scheme (
http
,https
)Netloc (domain, subdomain, and optional port)
Path, query, fragment, etc.
Validate Each Piece:
Apply regular expressions to the components that matter most for your use case:Scheme: Ensure it’s
http
orhttps
.Netloc: Confirm it’s a valid domain name or IP address, and optionally check for a port (e.g.,
example.com:8080
).Path: If needed, add checks for valid characters in the path segment.
IP Address Support:
If your URLs might contain IP addresses instead of domain names, include regex patterns capable of matching IPv4 addresses. For IPv6 support, use a specialized IPv6 validator—such as Markus Jarderot’s widely regarded regex—for robust parsing.Example Workflow:
Parse the URL:
from urllib.parse import urlparse import re url = "https://127.0.0.1:5000/home" parsed = urlparse(url)
Validate scheme:
if parsed.scheme not in ["http", "https"]: # Handle invalid scheme
Validate netloc (domain or IP, with optional port):
domain_pattern = r"^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" ipv4_pattern = r"^\d{1,3}(\.\d{1,3}){3}$" port_pattern = r":\d{2,5}$" netloc = parsed.netloc.split(':')[0] # Extract domain/IP
For IPv6, integrate a dedicated validation function to accurately detect and confirm legitimate IPv6 addresses.
This modular technique gives you fine-grained control: you can adapt the regex and logic to your specific form, crawler, or parser requirements. It’s a great way to manage tricky edge cases that simple string-wide regex approaches might miss.
Detecting Potential URLs Beyond Strict Patterns
Sometimes, you may need to recognize tokens that could be URLs—even if they're not perfectly formed. For instance, you might encounter strings like yandex.ru.com/somepath
or www.example.org
, which don’t always match the strictest regex but still represent URLs in practice.
To address this, consider checking two things:
Does the text start with common URL schemes or prefixes (like
http
,www
, orftp
)?Does it end with a valid public domain suffix?
Here's a practical Python example that fetches an up-to-date list of public domain suffixes and uses them to identify likely URLs:
import requests def get_domain_suffixes(): res = requests.get('https://publicsuffix.org/list/public_suffix_list.dat') suffixes = set() for line in res.text.split(' '): if not line.startswith('//'): domains = line.split('.') cand = domains[-1] if cand: suffixes.add('.' + cand) return tuple(sorted(suffixes)) domain_suffixes = get_domain_suffixes() def reminds_url(txt: str): """ Returns True if the text looks like a URL. Example: >>> reminds_url('yandex.ru.com/somepath') True """ ltext = txt.lower().split('/')[0] return ltext.startswith(('http', 'www', 'ftp')) or ltext.endswith(domain_suffixes)
This approach is especially useful for quick validation or preprocessing, where you want to capture URLs even if they're missing a protocol or have unusual structures.
Handling Python 2 and Python 3 for URL Validation
Python’s urlparse
module is a handy way to validate URLs, but the import path changes between Python 2 and Python 3. Here’s how to ensure compatibility and robust URL checking across both versions.
Cross-Version Import
Depending on your environment, you’ll want to handle the import gracefully:
try: # For Python 2 from urlparse import urlparse except ImportError: # For Python 3 from urllib.parse import urlparse
Example Function for URL Validation
After importing urlparse
, you can create a simple validator function:
def is_valid_url(url): try: result = urlparse(url) return all([result.scheme, result.netloc]) except (AttributeError, TypeError): return False
This function checks for both a scheme (like http
or https
) and a network location, filtering out partial paths, numbers, and malformed strings.
Sample Usage
urls = [ "http://www.cwi.nl:80/%7Eguido/Python.html", # Valid "/data/Python.html", # Invalid (missing scheme) 532, # Invalid (not a string) u"dkakasdkjdjakdjadjfalskdjfalk", # Invalid (nonsense string) "https://qodex.ai" # Valid ] results = [is_valid_url(u) for u in urls] print(results) # Output: [True, False, False, False, True]
This approach keeps your validation logic compatible regardless of whether you're running Python 2 or Python 3. And of course, it’s a good companion to using regex for more nuanced rules.
Alternative: Using Validation Libraries
While regex is great for quick URL checks, Python has some powerful validation libraries that can save you time and headaches, especially when edge cases start popping up.
Using the Package
The package provides simple functions for validating URLs (and many other types of data like emails and IP addresses). Here’s how you can use it:
import validators print(validators.url("http://localhost:8000")) # True print(validators.url("ftp://invalid.com")) # ValidationFailure object (evaluates to False)
For more robust code, consider wrapping this check to always return a boolean:
import validators from validators import ValidationFailure def is_string_an_url(url_string: str) -> bool: # Always strip whitespace before validating! result = validators.url(url_string.strip()) return result is True # Only True is valid; ValidationFailure is falsy
Examples
print(is_string_an_url("http://localhost:8000")) # True print(is_string_an_url("http://.www.foo.bar/")) # False print(is_string_an_url("http://localhost:8000 ")) # True (after .strip())
Tip: Always trim leading and trailing spaces before validating URLs, as even a single space will cause most validators—including regex and libraries like —to reject the input.
Validation Using Django’s URLValidator
If you’re already using Django, leverage its built-in URL validator for comprehensive checks:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_string_an_url(url_string: str) -> bool: validate_url = URLValidator() try: validate_url(url_string.strip()) return True except ValidationError: return False
Examples
print(is_string_an_url("https://example.com")) # True print(is_string_an_url("not a url")) # False
Adding Django just for its URL validation is probably overkill, but if you’re in a Django project already, this is one of the most reliable approaches.
With these approaches—regex for quick checks, and libraries for thorough validation—you can confidently handle URL validation in a variety of Python projects.
Validating URLs with Pydantic
Another slick option for URL validation in Python is the Pydantic library. While it’s most famous for parsing and validating data for FastAPI and configuration models, Pydantic actually provides a robust set of URL data types out of the box.
Pydantic’s URL Types: A Quick Overview
Pydantic comes with several helpful field types—perfect if you need more specificity than just “any old URL.” For example:
AnyUrl
: Accepts nearly all valid URLs, including custom schemes.AnyHttpUrl
: Restricts to HTTP and HTTPS URLs.HttpUrl
: Demands HTTP/HTTPS, includes checks for host and TLD.FileUrl
,PostgresDsn
, etc.: Specialized for files or specific database connections.
Refer to the documentation for a full list of options and scheme support.
How to Use with a Minimal Example
Here’s a typical usage pattern with Pydantic:
from pydantic import BaseModel, AnyHttpUrl, ValidationError class Config(BaseModel): endpoint: AnyHttpUrl # or choose the URL type you need try: conf = Config(endpoint="http://localhost:8080") print(conf.endpoint) # Will print a validated URL except ValidationError: print("Not a valid HTTP(s) URL")
Attempting to create a model with an invalid URL will raise a
ValidationError
you can catch to handle input errors gracefully.Pydantic also helps clarify why a value is invalid in its error messages.
Limitations and Gotchas
While Pydantic’s validators are thorough, keep in mind:
Some schemes (like
ftp
or database DSNs) requireAnyUrl
or more specific types (likePostgresDsn
).The strictness of validation depends on which field type you pick.
Leading/trailing spaces should be trimmed before assignment (Pydantic will usually do this, but don’t rely on it for noisy or poorly sanitized input).
Sample URLs and Outcomes
Here’s a taste of how Pydantic’s AnyHttpUrl
responds:
"http://localhost"
– valid"http://localhost:8080"
– valid"http://user:password@example.com"
– valid"http://_example.com"
– valid (underscore accepted)"http://&example.com"
– invalid (symbol not allowed)"http://-example.com"
– invalid (hyphen at start is rejected)
For comprehensive URL checks, Pydantic combines convenience with clarity—making your data models safer with minimal effort.
Checking the Latest Public Suffixes for Domain Validation
Sometimes, validating a URL or domain isn’t just about confirming the syntax—especially if you want to ensure your code recognizes valid top-level domains (TLDs) and public suffixes. To stay current with domain extensions (including newer ones like .dev
, .app
, or .io
), you can programmatically retrieve the official public suffix list maintained by Mozilla.
Here’s a simple Python approach that pulls the latest list directly from publicsuffix.org and extracts all recognized domain suffixes:
import requests def fetch_public_suffixes(): response = requests.get('https://publicsuffix.org/list/public_suffix_list.dat') suffixes = set() for line in response.text.splitlines(): line = line.strip() # Skip comments and empty lines if line and not line.startswith('//'): suffixes.add('.' + line) return tuple(sorted(suffixes)) # Fetch the latest suffixes domain_suffixes = fetch_public_suffixes()
What this does:
Downloads the current public suffix list.
Ignores comments and empty lines in the dataset.
Collects each suffix into a tuple for easy lookups.
This technique helps ensure your domain validation logic is aware of every TLD currently recognized by major browsers and libraries—so you’re not blindsided by new suffixes.
Use this in your URL or email checker to make your validations future-proof and standards-compliant.
When Should You Do a DNS Check?
A quick note for thoroughness: validating a URL's format—whether using regex, the validators
package, or Django’s built-in tools—only ensures the string looks like a URL. It doesn’t tell you whether that URL actually exists or leads to a live destination.
That’s where DNS checks come in. If you truly need to confirm that a URL points to a real, resolvable domain (e.g., verifying "https://www.google" isn’t just well-formed, but actually goes somewhere), you’ll need to go a step further by performing a DNS lookup. This process asks, "Does this domain exist on the internet right now?"—something no regex or typical package will answer for you.
DNS checks aren’t always necessary for basic validation tasks like form inputs or static checks. But, if you’re building anything that relies on external connectivity (think: crawlers, link checkers, or automated testing tools), adding a DNS resolution step is a good way to catch invalid or unavailable domains before they cause trouble later in your workflow.
Using the Package for URL
If you prefer not to write your own regex, you can easily check if a string is a valid URL by using the popular validators Python package. Here’s a straightforward approach:
import validators def is_valid_url(url_string): result = validators.url(url_string) return result is True # Returns True if valid, False otherwise # Example usage print(is_valid_url("http://localhost:8000")) # True print(is_valid_url("http://.www.foo.bar/")) # False
The function
is_valid_url
returnsTrue
only when the provided string passes all URL checks performed by the library.Internally,
validators.url()
returnsTrue
when valid, or aValidationFailure
object when not—so this function keeps things simple.
Use this for quick, robust validation without wrangling regex patterns.
Quick tip: Always strip spaces before validation, especially if the URL is coming from user input or copy-paste operations.
This approach is efficient, readable, and saves you from reinventing the wheel when working with URLs in Python.
Using Python Standard Library for URL Validation
Prefer to stick with the standard library? You can use urlparse
(available via urllib.parse
in Python 3 and urlparse
in Python 2) to check whether a string is structured like a URL—without installing any third-party libraries.
Here's a basic approach:
try: # Python 2 from urlparse import urlparse except ImportError: # Python 3 from urllib.parse import urlparse def is_valid_url(string): try: result = urlparse(str(string).strip()) # Validation: must have scheme (like http/https) and netloc (domain) return all([result.scheme, result.netloc]) except Exception: return False
Examples:
print(is_valid_url('http://www.python.org')) # True print(is_valid_url('/just/a/path/file.txt')) # False print(is_valid_url(12345)) # False print(is_valid_url('not a url at all')) # False print(is_valid_url('https://github.com')) # True
Note:
URL parsing checks structure, not whether the URL is actually reachable on the internet. For more extensive validation (including syntax and even DNS lookups), consider using packages like validators
or requests
. But for basic checks, urlparse
fits the bill.
Making Validated URL Objects Act Like Strings
Say you’ve wrapped URL validation inside a custom class—how do you make sure your objects still behave like regular strings throughout your codebase? It’s simple: just ensure your class inherits from str
directly, or implements the required string methods. This way, once a URL has passed your checks, you can use it anywhere a string is expected.
For example:
class ReachableURL(str): def __new__(cls, url): # Validate the URL here... # (Assume validation passes for this example) return str.__new__(cls, url)
Now, instances of ReachableURL
can be used seamlessly—just like ordinary strings:
url_instance = ReachableURL("http://example.com") print(isinstance(url_instance, str)) # True print(url_instance.upper()) # HTTP://EXAMPLE.COM
This approach lets you layer extra functionality (like validation or reachability checks) while retaining all the familiar power of Python’s string operations. So, whether you’re concatenating, slicing, or handing off URLs to other libraries, you’ll keep everything as clean and Pythonic as possible.
Using Django’s URLValidator to Check URLs in Python
Django comes with a handy built-in tool for validating URLs: the URLValidator
from the django.core.validators
module. This validator is designed to determine whether a given string matches the criteria for a valid web address. If you’re already using Django in your project, it’s a convenient and reliable approach for URL validation.
Here’s how you can use it:
Import the Necessary Classes:
URLValidator
for the actual validationValidationError
to handle invalid cases
Write a Simple Validation Function:
from django.core.validators import URLValidator from django.core.exceptions import ValidationError def is_valid_url(url: str) -> bool: validator = URLValidator() try: validator(url) return True except ValidationError: return False
When you call
validator(url)
, it checks if theurl
string adheres to standard URL patterns.If the supplied value isn’t a valid URL, it raises a
ValidationError
. The function returnsTrue
for valid URLs andFalse
for invalid ones.
While Django’s URLValidator
is powerful, keep in mind that adding Django as a dependency may be unnecessary for lightweight projects. However, for those already using Django, it’s a robust option for all your URL validation needs.
How validators.url Works in Python
If you're looking for a quick, reliable way to check whether a string is a valid URL in Python, the validators
package is a handy tool. Its url
function makes URL validation straightforward—even for tricky cases.
How It Validates
Pass your URL as a string to
validators.url()
.If the input is a valid URL, you'll get
True
as the result.If the URL is not valid, instead of a simple
False
, it returns an object calledValidationFailure
. While this might feel a bit unexpected, it still makes it easy to know whether your URL passes or fails validation.
Example Usage
Here’s what a typical validation flow might look like:
import validators from validators import ValidationFailure def is_string_a_url(candidate: str) -> bool: result = validators.url(candidate) return False if isinstance(result, ValidationFailure) else result # Checking results print(is_string_a_url("http://localhost:8000")) # Outputs: True print(is_string_a_url("http://.www.foo.bar/")) # Outputs: False
This approach ensures you're only working with recognized, well-formed URLs—perfect for situations where data quality matters most.
Enforcing URL Validation with Python Classes and Type Checking
If you want to enforce URL validation at a deeper level across your codebase—not just at input time or form submission—you can use Python’s class inheritance and type checking. By encapsulating validation logic inside custom types, you make it much harder for invalid URLs to sneak into your application logic or data models.
Example: Creating a URL Type with Built-in Validation
You can define a custom string subclass that validates its input every time it’s instantiated. This approach leverages Python’s rich data model and is especially useful if you’re working in larger codebases, or when you want your function signatures and type hints to truly mean “URL—not just any string!”
Here’s a typical pattern using standard library features and popular utilities:
from urllib.parse import urlparse class URL(str): def __new__(cls, value: str): result = urlparse(value) # Only allow non-empty scheme and netloc (host/address) if not (result.scheme and result.netloc): raise ValueError(f"Invalid URL: {value!r}") return str.__new__(cls, value)
Usage Example:
site = URL("https://wikipedia.org") # works fine another = URL("not a url") # raises ValueError
Any attempt to create a URL
object with an invalid address immediately results in an error, so only valid URLs can be used downstream.
Benefits of Using Custom Types
Early Validation: Problems surface instantly at the object creation stage.
Type Safety: Your IDE and static analysis tools (e.g., mypy) can help catch mistakes when you annotate with your custom
URL
type.Cleaner Code: Functions and classes that require URLs can explicitly declare so, boosting readability and reducing runtime surprises.
Extending Functionality
For stricter checks—such as ensuring the URL is reachable, uses HTTPS, or isn’t a localhost address—simply extend your base URL
class and add additional validation in __new__
.
import socket class ReachableURL(URL): def __new__(cls, value: str): instance = super().__new__(cls, value) hostname = urlparse(instance).hostname if not hostname: raise ValueError(f"Invalid URL: {value!r}") try: socket.gethostbyname(hostname) except socket.error: raise ValueError(f"Hostname not resolvable: {hostname}") return instance
With this approach, your URL handling code stays explicit, self-documenting, and robust—whether you’re writing a web crawler, building APIs with FastAPI or Django, or just aiming for cleaner domain models.
Django’s URL Validator vs. Standalone Packages
If you're considering URL validation for your project, you might wonder whether to rely on Django’s built-in utility or opt for a lightweight standalone package. Here’s a quick breakdown to help you weigh the options:
Advantages of Django’s URL Validator:
Comprehensive Checks: Django’s validator is well-tested, supports standard URL patterns, and even has an option to check if the URL actually exists (
verify_exists
).Integration: Seamlessly fits within the rest of Django’s validation ecosystem, making it perfect for projects already using Django for forms or models.
Community and Documentation: You benefit from a large, active community and thorough documentation—making it easier to troubleshoot or extend.
Drawbacks to Consider:
Dependency Bloat: Including Django just for URL validation can be overkill—Django is a robust, full-featured framework, and significantly increases your project’s size and dependencies if you’re not already using it.
Complexity: For smaller scripts, microservices, or non-Django projects, a standalone library (such as
validators
or simple regex-based checks) will keep things lean and more easily maintained.Performance: Extra dependencies sometimes add startup time and potential version conflicts, especially in minimalist environments.
Summary:
If your stack already uses Django, leveraging its URL validator is a solid and hassle-free choice. For lightweight projects or scripts, standalone validation packages or tailored regex rules will keep your footprint minimal and setup easier. Choose based on your project’s needs and existing tech stack!
Why Trim Whitespace Before URL Validation?
Before validating a URL, it’s essential to remove any leading or trailing spaces from the string. Even an extra space at the start or end—something easy to miss when copying and pasting—will cause most validation methods, including Python’s strict regex patterns, to treat the URL as invalid.
For example, "http://localhost:8000 "
(with a trailing space) will fail validation, even though the actual URL is fine. By using Python’s strip()
method, you ensure you’re testing the true URL as intended:
url = "http://localhost:8000 " is_valid = is_string_an_url(url.strip()) # Returns True
Trimming whitespace helps your validations stay reliable, prevents false negatives, and ensures your applications don’t accidentally reject legitimate URLs due to minor copy-paste issues.
Use Cases
Form Validation: Ensure users submit well-structured URLs in web forms.
Data Cleaning: Remove or fix malformed links in large datasets.
Crawlers & Scrapers: Verify URLs before crawling or scraping content.
Security Filtering: Block suspicious or malformed URLs from being stored or executed.
Useful tools:
IP Address Regex Python Validator – for network-based URL checks
Email Regex Python Validator – to validate contact forms
Password Regex Python Validator – when securing user input alongside URLs
Categorized Regex Metacharacters
^
: Matches the start of the string$
: Matches the end of the string.
: Matches any character (except newline)+
: Matches one or more of the previous token*
: Matches zero or more of the previous token?
: Makes the preceding token optional[]
: Matches any one character inside the brackets()
: Groups patterns|
: OR operator\:
: Escapes special characters like ":"
Pro Tips
Always use raw strings (r'') in Python to avoid escaping issues.
Add anchors ^ and $ to match the full URL and avoid partial matches.
Use non-capturing groups (?:...) for cleaner matching if needed.
Test localhost or custom ports using a regex like: localhost:\d{2,5}
Combine this validator with IP Address Regex Python Validator for APIs or internal tools.
More on Domain and Port Validation
When validating URLs, it's important to remember that the regex should handle both the scheme (like http
or https
) and the domain (or netloc) parts of the URL. The domain section includes everything up to the first slash /
, so port numbers (like :8000
) are safely included in this part of the match.
For example:
This approach ensures that your validator can match standard domains, custom ports, and even IPv4 addresses.
Supporting IPv6 Addresses
If you need to validate URLs containing IPv6 addresses, consider enhancing your regex or integrating a specialized IPv6 validator. A comprehensive IPv6 regex will handle the full range of valid address formats, so incorporate a solution like Markus Jarderot's IPv6 validator for best results. Remember to check both the domain and the IP format when validating.
Examples in Action
IPv4 and Alphanumeric Domains:
Use a regex that matches standard domains and IPv4 addresses. For reference, tools like can help you test and refine your patterns.IPv6 Support:
With the right regex, you can capture URLs using IPv6 addresses, ensuring your validation routine is robust for any environment—including internal networks or modern APIs.
By combining these approaches, your URL validation will be flexible enough for everything from localhost development to production-grade, multi-protocol endpoints.
Frequently asked questions
Discover, Test, and Secure your APIs — 10x Faster.

Product
All Rights Reserved.
Copyright © 2025 Qodex
Discover, Test, and Secure your APIs — 10x Faster.

Product
All Rights Reserved.
Copyright © 2025 Qodex