How to get correct URL in HTTP header?

Apache's handling of spaces in GET request

  • Hi, I have developed a PHP script. This script is intended to be accessed through the AOL Instant Messenger (AIM) client, rather than a typical web browser like Internet Explorer. A link to the .php file is placed in the AIM user's profile. The syntax of the URL query is http://www.domain.com/view.php?id=16&nick=%n The variable "id" is an identification number that is not relevant to this question. The variable "nick" holds %n, which in AOL Instant Messenger is replaced with the visitor's AIM Screenname. Therefore, when a visitor with Screenname "big john" views this AIM user's profile, the visitor will see a link with the following URL (notice how "%n" is replaced with "big john"): http://www.domain.com/view.php?id=16&nick=big john The link has TARGET="_self", so when the link is clicked on by the visitor, the page will load in AIM's profile window (AIM has its own internal browser), rather than launching an external browser like IE. The problem is that AIM's internal browser (user-agent: "AIM/30 (Mozilla 1.24b; Windows; I; 32-bit)"), unlike other browsers, does not replace spaces in URL's with %20 or +. Therefore, AIM sends the raw space directly to my Apache server (I am running Apache HTTP 1.3.22), resulting in the following request: "GET /view.php?id=16&nick=big john HTTP/1.0" Apache uses a space to differentiate between the request and the protocol. Since there is a misplaced space in "big john", the request is broken up and Apache incorrectly identifies "john" as the protocol (instead of HTTP/1.0). The result is an HTTP 400 error. My goal, obviously, is to make the page accessible to the visitor instead of giving him a HTTP 400 error. There are, however, several problems that complicate the situation: 1.) I have no control over the %n portion of the URL. Since %n is automatically replaced by AIM with the visitor's Screenname, if the visitor has a space in his Screenname, the URL will automatically contain a raw space (as was the case with the "big john" example). I have no way of converting this raw space into either %20 or + before it is processed by Apache. 2.) AIM will not replace the raw space for me. AIM, unlike every other browser, does not convert a URL's unsafe characters into their hex values. If Internet Explorer had handled this request, it would have converted the space into %20, resulting in the following request (which would've worked): "GET /view.php?id=16&nick=big%20john HTTP/1.0" AIM, however, does not do this, which is why I get stuck with this bad request: "GET /view.php?id=16&nick=big john HTTP/1.0" Because there is nothing I can do to prevent the raw space from being sent to Apache, and cannot translate the space to %20, I thus need some way to configure Apache so that it will accept this bad request. From searching the Internet, it appears that the answer to this lies in using mod_rewrite. However, I have no experience doing this, so I do not know what rule/s I would need to add to make this work. I found the following solution on Google Groups (http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&selm=U6OU7.160915%24Ga5.25940562%40typhoon.tampabay.rr.com), but it did not work when I tried it on my server (I'm not sure whether the code's flawed, whether I implemented it wrong, or whether the solution's just simply too old for my version of Apache). Another person (http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&selm=8krV9.4847%24kH3.1571%40sccrnsc03) was able to write a program to fix this problem, but I do not know the source code, nor do I know how to write it. I am requesting step-by-step instructions for a working solution on how to make Apache handle my bad request so that the request will go through (I obviously need all variables intact, or at least intact enough so that they can be manipulated by the view.php script). Since I have no experience with Apache or rewriting, I would need easy-to-follow directions complete with all necessary source code. Thanks in advance. P.S. Note that the solution lies in configuring Apache, and NOT in rewriting the PHP script. Any changes to the PHP script itself will be useless because Apache prevents the request from ever reaching the actual .php file.

  • Answer:

    Hi, whiteout, and thanks for your question. While it is possible to use the insanely powerful mod_rewrite to solve part of this problem, it's unnecessary, and can be extremely complex to get working properly. The first part of the problem is the '400 Bad Request' error recent versions of Apache will return. Apache generally conforms strictly to the HTTP Protocol, which disallows any spaces in the request. Indeed, as you mentioned, more than two spaces in the request string will cause Apache to display the '400 Bad Request' page. Older versions of Apache (pre-1.3.26) would allow these malformed requests, parsing the URI up to the first encountered space, and if the remainder was not in the 'HTTP/x.x' format, it would ignore it, assuming it was HTTP/1.0. This error was fixed in 1.3.26. There is, however, a rare option - the 'ProtocolReqCheck' option - that will restore this functionality/bug. So, your first step should be to add the option 'ProtocolReqCheck off' in Section One of your apache config file. If you are unsure where to add it, insert it on a new line after the 'ServerType standalone' line. (Be warned that clients that use this strange syntax will be assumed to be HTTP/1.0 clients, and will possibly lose HTTP/1.1 functionality. This should not really be a problem, though.) When apache is next restarted, requests in the form GET /?z=bla bla HTTP/1.1 will be processed by apache. This is not the complete solution to the problem, though, since apache will only process the URL up to the first space (so $_GET['z'] will be 'bla', not 'bla bla' in the above example). You can get around this by using getenv('SERVER_PROTOCOL'), which returns everything after the url (for example, with the above request, it would return 'bla HTTP/1.0'). The below code would work for your example, setting 'nick' to the full nickname: // get everything after the url in the request $serv_prot = getenv('SERVER_PROTOCOL'); // get rid of the http/x.x part of it $sans_protocol = str_replace(array(" HTTP/1.0", " HTTP/1.1"), "", $serv_prot); // set nick to nick from query string + space + $sans_protocol from above $nick = $_GET['nick'] . " " . $sans_protocol; This code would have to appear at the top of any script that needed to access the full nickname. The 'nick' variable will then contain the fill nickname'. This will also need modifying if there are additional variables specified after 'nick' in the query string, or if 'nick' is renamed. This has been tested and works on Apache 1.3.29. It should work on any version of Apache 1 >= 1.3.27. These sites may provide futher information: http://forums.devshed.com/archive/t-58255 http://forums.devshed.com/t46291/s.html http://forums.devshed.com/t26614/s.html http://apache.active-venture.com/mod/core8.htm (the last item on the page) The following searches may be of use to you: ('subprofile url spaces') : ://www.google.com/search?q=subprofile+url+spaces (subprofile.com has the ability to process these types of requests) ://www.google.com/search?q=aim+profiles+space+%22400+bad+request%22 I hope this is of use to you. If I was unclear in any part, please do not hesitate to request a clarification. --wildeeo-ga

whiteout-ga at Google Answers Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.