On reading the RFC I thought how can I determin is space is sufficent to delimit the end of a URI in may possible cases, Ahh Ask ChatGPT 4
If we have a uri followed by a space is it save to assume the whole uri proceeds that space?
ChatGPT
In the context of text processing, if you have a URI (Uniform Resource Identifier) followed by a space, it’s generally safe to assume that the URI is completed at that point. URIs do not include spaces; when a space appears, it’s usually a delimiter indicating the end of the URI.
However, there are a few caveats:
- Encoded Spaces: In certain cases, a space may appear in a URI but it would be URL-encoded as
%20
or+
. Therefore, if you see these characters, it’s not the end of the URI. - Unencoded Spaces: Depending on the context, unencoded spaces might still exist within a URI due to mistakes or misformatting. This could cause an issue if you are trying to identify or extract URIs based on spaces.
- Miscellaneous Delimiters: Note that spaces aren’t the only way URIs might be delimited in text. For example, URIs in HTML might be surrounded by quotes, and in Markdown they might be enclosed in parentheses for links.
To summarize, while a space often indicates the end of a URI, this may not always be true due to factors like encoding, misformatting, and different delimiters used in different contexts. Therefore, it’s always good to check for these conditions when processing URIs from a text.
I wonder then if a Regexp for a general set of delimiters, including space would be better?