How Can I Write An Anonymyzing Proxy In PHP?
-
I would like to make an anonymous proxy using PHP (similar in concept to how http://anonymouse.org works). The difference between my project and Anonymouse is that my project is intended for private group use on a specific group of sites (note that this does not break the TOS of either my awesome webhost or the sites on which I'll be using the proxy). The proxy needs to be able to do a few things. First and foremost, it needs to be able to retrieve a URL of my choice and store Cookies. I have a fairly good idea how to do that. The part I'm having trouble with is URL rewriting (so, for example, <a href="http://www.google.ca"> will be rewritten so it links to http://www.myproxy.com/?link=http://www.google.ca instead of directly to google), and also, Javascript stripping (one of the reasons I'm creating this is so my friends can access the site while on a locked down computer - they can't disable Javascript, and Javascript can be cleverly written to avoid URL rewriting). Is there any feasible way to ensure that the proxy remains truly anonymous? I've come up with two solutions so far: 1) Somehow strip all Javascript from the document, as well as intercept all incoming URLs and rewrite them. I know how to do this on a basic level, but I'm sure there are cases I'm missing. Any suggestions? (I am aware of the php proxy Poxy - but it says that its Javascript stripping is imperfect) 2) Since I know the sites that this will be used on, I could potentially write the proxy such that it acts as a screen scraper, getting all the useful info from the site itself, and writing it out in html-escaped form, using its own formatting. However, I'm worried about what might happen if the sites change layout suddenly. Is there a way to scrape html effectively so that it's not as sensitive to layout change? 3) Open to any other suggestions on how to write an anonymous PHP proxy (that runs on a shared host - so I can't do some fancy mod_rewrite trickery or anything and simulate a real proxy, unfortunately) Also, feel free to substitute PHP with Perl, Python, or Ruby (or some other scripting language that can run server-side). I'm asking about PHP because it's the easiest to deploy - but if there are compelling arguments for another language, I'm open to that too!
-
Answer:
There's http://www.jmarshall.com/tools/cgiproxy/ and http://www.usefulutilities.com/ (which costs, but the non-registered version can be useful.)
mebibyte at Ask.Metafilter.Com Visit the source
Other answers
A proper HTML parser like Ruby's http://code.whytheluckystiff.net/hpricot/ would probably be the best approach to screen-scraping with some layout independence and maintainability. Such a parser would also be useful for rewriting; you can walk all the elements and find hrefs, Javascript events, script tags, etc etc, more easily and accurately than trying to make some huge regexps to cope with every edge case. Other potentially useful tools for Ruby are http://blog.labnotes.org/category/scrapi/ and http://muharem.wordpress.com/2007/09/04/scrape-the-web-with-ruby/, which might operate at a nicer level for robust screen scraping. There are probably some similar things for PHP, but I've not been keeping up with what libraries are popular or good there.
Freaky
There is a cgi proxy available from john marshall written in perl. http://jmarshall.com/tools/cgiproxy/ It can allow javascript filtering and only certain sites. it has fulfilled all of my needs.
DJWeezy
Where will the proxy be running, just out of curiosity? On a web hosting account/domain name registered under your name? I'm just wondering what level of privacy you can hope to achieve in this way. The point of third party privacy service providers is just that, that they're third parties. You're going to be your own privacy service provider?
AmbroseChapel
http://sourceforge.net/projects/poxy/ of cgi proxy. I've used it. It is good. No longer in active development though.
zackola
I can't read. Sorry about that.
zackola
roue, zackola, DJWeezy: Unfortunately, both CGI Proxy and its PHP port (Poxy) state that their Javascript stripping is imperfect. This doesn't work for me :( Thanks anyway, though! Freaky: Interesting. I've never used these libraries before - the HTML that's being generated isn't really all that semantic... there are no IDs or classes to watch out for. Is that a problem if the layout suddenly changes? AmbroseChapel: Yeah, I know that the domain has my registration info on it. That's fine. When I say anonymity, I mean that my actions on this site appear to be coming from my webserver instead of my client computer. I may also run this from one of my own servers, but am not quite sure yet. Thanks!
mebibyte
Related Q & A:
- How can I get through a proxy server?Best solution by Super User
- how can I write this shell script in python?Best solution by Stack Overflow
- How can I parse a complex XML with PHP and CDATA?Best solution by Stack Overflow
- How can I write a good reference letter?Best solution by dailywritingtips.com
- How can I write an interview essay?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.