I need help. I'm building a REST-style web service. Its URLs specify
operations on other URLs, so I need to pass URLs as parameters to my
REST service. Let's say my service reverses the text of
whatever URL is specified.
http://example.com/rvs/someurl?a=b
The above request to my service means "return the contents of someurl reversed". To make things a bit
more complicated, there may be other stuff in my URLs after
someurl: the argument a=b in my example above.
So my question is, how do I correctly encode someurl? Let's say it's http://google.com/ I'm reversing. I'd think the request would contain a percent-quoted URL, something like
http://example.com/rvs/http%3A%2F%2Fgoogle.com%2F?a=b
However, the CGI
standard (and Apache2's implementation thereof) seems to be
decoding the URL before it gets to my application. Ie:
PATH_INFO contains
http://google.com/
not
http%3A%2F%2Fgoogle.com%2F
I can work around this
(REQUEST_URI isn't decoded), but something about all this
makes me uneasy. I'm relying on all web software between me and my
client to not mess with my carefully encoded URL. If the CGI standard
itself seems to think decoding is OK, who's to say some proxy or
browser won't too? Or to ask the question more existentially, do
these two URLs name the same resource? Or can they name different
things?
http://example.com/foo/bar
http://example.com/foo%2Fbar
Is there a good way to talk about URLs inside URLs? If you know the answer, mail me. I promise to share. |