Nelson's Weblog: tech / urlsAboutUrls2

URLs about URLs, some answers

I posed a question about embedding a subject URL in a request URL using percent encoding. Thank you for all the helpful replies, here's what I learned.

First, on the existential question, it seems in practice percent encoding really can create distinct names. Try:

http://somebits.com/weblog/ http://somebits.com%2Fweblog/

In reality this is just a quirk of Apache's handling of %2F, but since it's the default behaviour for the #1 web server out there that's a strong example. As for the theory, this wikipedia article claims that percent encoded reserved characters does create distinct names whereas percent encoded unreserved characters is just aliasing the same name. So escaping / would make a new name but escaping something like 0 wouldn't. Confusing, huh?

As to the practical problem of PATH_INFO being unescaped basically everyone told me "yeah, CGI's a hack like that". So going with the hack I'll just use the REQUEST_URI variable Apache sets. It's not documented anywhere I can find but it seems to be an unadulterated literal copy of what the client requested, from which I can do careful parsing. For my service clients will need to know to percent-escape any / or ? in their URLs. And I'll just hope nothing else in the network decideds it's OK to unescape things on me.

Some other suggested workarounds: length-delimit the subject URL so you know where it ends, have a magic string delimiting the end of the subject URL that you hope doesn't appear in any legitimate subject URL, or put the subject URL at the end of the request URL so that it ends where it ends. Any of these solutions could be made to work, I was just looking for the principle.

Thanks to SethG, RyanB, MikeB, GregW, GordonM, and SamR

tech
2006-08-23 01:25 Z


Mastodon @nelson@tech.lgbt Linkblog Tue 2025-04-15 Hertz data breach BG3: The Final Patch 4chan pwned Ubisoft Chroma Mohsen Mahdawi Mon 2025-04-14 Svalbard population map Fri 2025-04-11 US power outages Friend of a Friend Finder GL.iNet routers Blue Shield / Google disaster Thu 2025-04-10 Triple-I showcase Modulus Demo A1 education Tue 2025-04-08 Musk plays PoE 2 again Mon 2025-04-07 US citizen deportations Fri 2025-04-04 Warrington-Runcorn PDFgear Self-hosted email archive SF proposed zoning map Thu 2025-04-03 DOGE so far Search Archives 2024 12 11 10 09 08 07 06 05 04 03 02 01 2023 12 11 10 09 08 07 06 05 04 03 02 01 2022 12 11 10 09 08 07 06 05 04 03 02 01 2021 12 11 10 09 08 07 06 05 04 03 02 01 2020 12 11 10 09 08 07 06 05 04 03 02 01 2019 12 11 10 09 08 07 06 05 04 03 02 01 2018 12 11 10 09 08 07 06 05 04 03 02 01 2017 12 11 10 09 08 07 06 05 04 03 02 01 2016 12 11 10 09 08 07 06 05 04 03 02 01 2015 12 11 10 09 08 07 06 05 04 03 02 01 2014 12 11 10 09 08 07 06 05 04 03 02 01 2013 12 11 10 09 08 07 06 05 04 03 02 01 2012 12 11 10 09 08 07 06 05 04 03 02 01 2011 12 11 10 09 08 07 06 05 04 03 02 01 2010 12 11 10 09 08 07 06 05 04 03 02 01 2009 12 11 10 09 08 07 06 05 04 03 02 01 2008 12 11 10 09 08 07 06 05 04 03 02 01 2007 12 11 10 09 08 07 06 05 04 03 02 01 2006 12 11 10 09 08 07 06 05 04 03 02 01 2005 12 11 10 09 08 07 06 05 04 03 02 01 2004 12 11 10 09 08 07 06 05 04 03 02 01 2003 12 11 10 09 08 07 06 05 04 03 02 01 2002 12 11 10 09 08 07 06 05 04 03 02 01 2001 12 11 10 09 08 07 One good site MDN Nelson Minar nelson@monkey.org Blog licensed under a Creative Commons License		URLs about URLs, some answers I posed a question about embedding a subject URL in a request URL using percent encoding. Thank you for all the helpful replies, here's what I learned. First, on the existential question, it seems in practice percent encoding really can create distinct names. Try: http://somebits.com/weblog/ http://somebits.com%2Fweblog/ In reality this is just a quirk of Apache's handling of %2F, but since it's the default behaviour for the #1 web server out there that's a strong example. As for the theory, this wikipedia article claims that percent encoded reserved characters does create distinct names whereas percent encoded unreserved characters is just aliasing the same name. So escaping `/` would make a new name but escaping something like `0` wouldn't. Confusing, huh? As to the practical problem of `PATH_INFO` being unescaped basically everyone told me "yeah, CGI's a hack like that". So going with the hack I'll just use the `REQUEST_URI` variable Apache sets. It's not documented anywhere I can find but it seems to be an unadulterated literal copy of what the client requested, from which I can do careful parsing. For my service clients will need to know to percent-escape any / or ? in their URLs. And I'll just hope nothing else in the network decideds it's OK to unescape things on me. Some other suggested workarounds: length-delimit the subject URL so you know where it ends, have a magic string delimiting the end of the subject URL that you hope doesn't appear in any legitimate subject URL, or put the subject URL at the end of the request URL so that it ends where it ends. Any of these solutions could be made to work, I was just looking for the principle. Thanks to SethG, RyanB, MikeB, GregW, GordonM, and SamR tech 2006-08-23 01:25 Z Nelson's Weblog • tech → ago, bad, bittorrent, blosxom, dotnet, good, hqnx, iphone, mac, phone, photo, python, webservices