Making "pseudo-private" webpages
-
Background: My Internet Service Provider offers space for personal webpages, so occasionally if I want to share something with a friend (for example a collection of images too big to directly e-mail), I create and post a simple temporary "private" webpage in my home directory, or a subdirectory of it, with a name like "hellosusan.html", then e-mail the URL of the page only to Susan. There are no links from anywhere to that page, but I do maintain a simple "index.html" page in my home directory (that does not link to the page). The pseudo-privacy of the webpage is simply that no one else knows what to URL to type to get to that page. I thought that search engines generally can't find such "pseudo-private" pages, since they have no way of knowing the spelling of the URL (particularly the filename of the page) to go to. But a friend of mine thinks that search engines are capable of walking/iterating the entire directory hierarchy of a domain, discovering any and all files and subdirectories in each directory. While uncommon, I have encountered a few URLs that appear to be an automated listing of the entire directory, with links to each file. I thought I once heard that the presence of an index.html page in the same directory blocks such a listing, but I'm not clear on that point. I recently started using the tag: <meta name="robots" content="noindex, nofollow" /> to tell search engines not to index my page if they do happen to find it. But I know that some rogue search engine or bot won't necessarily respect that tag. I also realize that if I have a link on my page to an external website, a web browser can include the URL of my referring page as a field in the request to the external site. So that leads to the obvious questions: 1. Are search engines capable of iterating the entire contents of a given known directory in a domain? Or do they have to know the spellings of the files in the directory to find them (for example, due to links from an already-known page)? 2. What causes (or prevents) an automated listing of every file in a directory to appear in a browser? Does the presence or absence of an index.html file in a given directory affect that in any way? 3. Is there any way to edit my webpage to prevent a browser from including its URL as a field in requests to linked sites? 4. Assuming my friend doesn't forward the URL I e-mailed to her to others, are there any other gotchas in how search engines, or other people, could discover my pseudo-private webpage? Thanks.
-
Answer:
Hi philroy-ga, Taking your questions in turn: > 1. Are search engines capable of iterating the entire contents > of a given known directory in a domain? In general, search engines cannot discover the entire contents of a directory on your webserver. However, there are ways in which this could be made possible. For example, IF your ISP was serving the files by anonymous FTP (file transfer protocol) in addition to HTTP (the web's hypertext transfer protocol), AND a link existed from a web page to an FTP address in your website, AND the search engine was a specialized one that wanted to crawl FTP sites (e.g. to compile a list of downloadable files), then the search engine's crawler could request a directory listing by issuing an FTP command. But it's extremely unlikely that your ISP would be serving your files by anonymous FTP without your knowledge. In the normal situation, a search engine must follow a link to get to your webpage. > 2. What causes (or prevents) an automated listing of every > file in a directory to appear in a browser? Does the presence > or absence of an index.html file in a given directory affect > that in any way? An automated directory listing is produced by the webserver only when it is configured to do so. The listing is generated if the user types in (or clicks on a link to) a directory name AND the webserver can't find an ordinary page to serve (or has been instructed not to serve one). The webserver will look for pages such as index.html, index.php, index.cgi etc and will display the first one that it finds. The webserver will only generate the autoindex if none of these pages are found. The exact list of pages that the webserver looks for will differ according to how the webserver is configured, but unless your ISP has grossly misconfigured its webserver you can bet that index.html will be one of the pages that the webserver will display in preference to an autoindex. The Apache webserver uses the "DirectoryIndex" directive for this purpose: "The DirectoryIndex directive sets the list of resources to look for, when the client requests an index of the directory by specifying a / at the end of the directory name ,,. If none of the resources exist and the Indexes option is set, the server will generate its own listing of the directory." Apache HTTP Server Documentation http://httpd.apache.org/docs/2.0/mod/mod_dir.html#directoryindex > 3. Is there any way to edit my webpage to prevent a browser > from including its URL as a field in requests to linked sites? The URL of the page that you are leaving is called the referrer URL. It's not up to the webserver whether this is sent; it's up to the browser. Some browsers can be configured so that they do not send the referrer URL, but this is not usually satisfactory because some web pages depend on the presence of the referrer URL to function properly. > 4. Assuming my friend doesn't forward the URL I e-mailed to her > to others, are there any other gotchas in how search engines, or > other people, could discover my pseudo-private webpage? You should avoid using guessable names for the HTML files, particularly standard names that are frequently used (such as sitemap.html, login.html etc), because people might construct these URLs directly rather than following links. You need to also ensure that the webserver statistics for your website are not posted on the web, because they will contain links to your URLs. Similarly, your webserver logs must not be published on the web. Make sure too that the intended viewer does not bookmark your URLs publicly, for example by using a social bookmarking site such as http://del.icio.us/ Make sure that your ISP is not participating in any scheme to promote your URLs (e.g. by submitting URLs to search engines, or by submitting a sitemap to a service such as Google Sitemaps). To summarise: you can make your pages "pseudo-private" by: 1. Not divulging the URL to anyone except the intended viewer 2. Trusting the intended viewer to do likewise 3. Having an index.html file in each pseudo-private directory 4. Turning off the sending of referrers by your browser 5. Keeping your stats off the web 6. Using the robots meta-tag to keep the honest robots out (and a robots.txt file if your hosting arrangements permit) When all is said and done, it seems a lot more straightforward to forget about "pseudo-private" and go for password-protected. You can then email a password to the desired recipients. The procedure to set up a password-protected directory will differ according to which kind of webserver your ISP is using, and may not be possible with all ISPs. However, it is often straightforward, and will certainly keep the search engine crawlers out. If that is not possible, you could consider an online service that allows you to create webpages to be shared with people who you invite, for example: MyFamily http://www.myfamily.com/ I trust this answers your questions. If not, feel free to request clarification. Regards, eiffel-ga Google Search Strategy: apache autoindex ://www.google.com/search?hl=en&q=apache+autoindex "private website" ://www.google.com/search?hl=en&q=%22private+website%22 "keep out search engines" ://www.google.com/search?hl=en&q=%22keep+out+search+engines%22
philroy-ga at Google Answers Visit the source
Related Q & A:
- Where To Buy Wholesale Candle Making Materials In The Philippines?Best solution by Yahoo! Answers
- Do consumers do more research before making an expensive purchase than before making a cheaper purchase?Best solution by answers.yahoo.com
- What is the Psychoanalytic theory of consumer decision making process?Best solution by Yahoo! Answers
- How do you keep msgs in private groups private?Best solution by answers.yahoo.com
- What is pseudo transformational leadership?Best solution by wiki.answers.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.