Website Mirroring Tools
-
Website Mirroring Tools: I usually use http://www.gnu.org/software/wget/wget.html, but find that it seems to have at least one nasty flaw: it doesn't fetch stylesheets, or images named in stylesheets (it may also have other flaws I'm not aware of). Is there something better out there that I can just give a homepage URL and have it suck down an entire site?
-
Answer:
http://www.opal.dhs.org/programs/omt/index.oml: "OMT is a simple script for mirroring Web pages for off-line/mirror reading. It rewrites the content of the pages to make a complete and functional mirror. It has a number of options to specify what files should be mirrored and what renaming should occur." http://www.httrack.com/: "It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer." http://langfeldt.net/w3mir/: "w3mir supports HTML4, and has partial support for CSS, Java and ActiveX. And it should work on Win32 machines." Generally, I just use wget -m -np -t 0 -c -p URL, but I don't really care that much if I'm missing a couple of stylesheets.
namespan at Ask.Metafilter.Com Visit the source
Other answers
I'm using wget version 1.8.2, and it has a -p or --page-requisites option. According to the man page:This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets. Ordinarily, when downloading a single HTML page, any requisite documents that may be needed to display it properly are not downloaded. Using -r together with -l can help, but since Wget does not ordinarily distinguish between external and inlined documents, one is generally left with "leaf documents" that are missing their requisites. Perhaps that'll do what you want? I don't know if it will get images mentioned in a .css though. But hey, I bet it will get embedded MIDI files, and that's almost as good.
mragreeable
<rueful caveat> just be damn careful about trying to use wget on, say, a geocities site, escpecially if you don't fully understand the ludicrous plethora options, each of which is expressable in alternate ways, not to mention the geocities habit of indecipherable spreading of various site bits across multiple hosts.
quonsar
install http://www.mozilla.org/products/firefox/, then install http://spiderzilla.mozdev.org/.
trondant
So what are everybody's favorite pr0n sites?
coolgeek
coolgeek: I'm pretty sure you're just being snarky or trying to be funny, but http://ask.metafilter.com/mefi/4173.
majick
How do I use wget to just strip mine .mp3s from a site?
mecran01
Related Q & A:
- Unix tools: what if a file is named minus something?Best solution by Super User
- Can I get OSQL if I install SQL Server Client Tools?Best solution by Database Administrators
- How to save ONLY the CSS changes of Styles panel of Chrome Developer Tools?Best solution by Stack Overflow
- How to download programming tools for Assembly?Best solution by assembly-language-programming.winsite.com
- What are the best tools to analyze a competitor's website?Best solution by Quora
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.