What is mirroring a site?

Website Mirroring Tools

  • Website Mirroring Tools: I usually use http://www.gnu.org/software/wget/wget.html, but find that it seems to have at least one nasty flaw: it doesn't fetch stylesheets, or images named in stylesheets (it may also have other flaws I'm not aware of). Is there something better out there that I can just give a homepage URL and have it suck down an entire site?

  • Answer:

    http://www.opal.dhs.org/programs/omt/index.oml: "OMT is a simple script for mirroring Web pages for off-line/mirror reading. It rewrites the content of the pages to make a complete and functional mirror. It has a number of options to specify what files should be mirrored and what renaming should occur." http://www.httrack.com/: "It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer." http://langfeldt.net/w3mir/: "w3mir supports HTML4, and has partial support for CSS, Java and ActiveX. And it should work on Win32 machines." Generally, I just use wget -m -np -t 0 -c -p URL, but I don't really care that much if I'm missing a couple of stylesheets.

namespan at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

I'm using wget version 1.8.2, and it has a -p or --page-requisites option. According to the man page:This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets. Ordinarily, when downloading a single HTML page, any requisite documents that may be needed to display it properly are not downloaded. Using -r together with -l can help, but since Wget does not ordinarily distinguish between external and inlined documents, one is generally left with "leaf documents" that are missing their requisites. Perhaps that'll do what you want? I don't know if it will get images mentioned in a .css though. But hey, I bet it will get embedded MIDI files, and that's almost as good.

mragreeable

<rueful caveat> just be damn careful about trying to use wget on, say, a geocities site, escpecially if you don't fully understand the ludicrous plethora options, each of which is expressable in alternate ways, not to mention the geocities habit of indecipherable spreading of various site bits across multiple hosts.

quonsar

So what are everybody's favorite pr0n sites?

coolgeek

coolgeek: I'm pretty sure you're just being snarky or trying to be funny, but http://ask.metafilter.com/mefi/4173.

majick

How do I use wget to just strip mine .mp3s from a site?

mecran01

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.