How can I download and archive a blog or website in plain text?
-
There are some blogs and websites I would like to read in plain text or even HTML, I just don't want to click through hundreds of pages, and I definitely don't want to download them one by one. Some of them also disappear forever after a while, and I am not exactly going to Evernote a hundred screenshots. I can't seem to find any tools that straightforwardly do what I want. I've found a couple that will convert the first few pages of a website into a Kindle-friendly format, but no more than that. This seems like it'd be straightforward, but I can't find anything. Paid software or services are fine.
-
Answer:
https://archive-it.org/learn-more (built by the same people who maintain the https://archive.org/web/, I believe) does exactly what you described: Archive-It enables you to capture, manage and search collections of digital content without any technical expertise or hosting facilities. http://www.archive-it.org/
ziggly at Ask.Metafilter.Com Visit the source
Other answers
This page could be helpful: http://danwin.com/2010/04/coding-for-journalists-go-from-a-know-nothing-to-web-scraper-in-an-hour-hopefully/.
soelo
I believe that you can use http://lynx.browser.org/ to do this. -dump dumps the formatted output from a web page to a file, specifically check out the -crawl and -traversal options, the man page for -traversal says:traverse all http links derived from startfile. When used with -crawl, each link that begins with the same string as startfile is output to a file, intended for indexing. See CRAWL.announce for more information.Note that a lot of web pages these days are just loading skeleton HTML and then loading the rest with JavaScript/AJAX-ish calls, and http://lynx.browser.org/ doesn't do JavaScript.
straw
For low-traffic blogs that I know I want to read and publish an RSS feed, I create an https://ifttt.com/ recipe in the http://instapaper.com/ channel to save everything there. I believe you can also do keyword matching with that recipe to constrain what gets saved.
These Premises Are Alarmed
Related Q & A:
- How can I allow user to create posts in website using ASP.NET?Best solution by Programmers
- How can I download Yahoo messenger?Best solution by Yahoo! Answers
- How can i send some one a text through my computer?Best solution by Yahoo! Answers
- How can I download a mp3 to my Razr v3?Best solution by Yahoo! Answers
- How can I download free movies for a Macbook?Best solution by wiki.answers.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.