Creating an ebook from a wiki
-
How would one go about exporting a whole MediaWiki installation as an ebook? What I have:an http://meta.wikimedia.org/wiki/Help:Export of the Wikia wiki I want to read (~18Mb file, 5000+ articles, CC-BY-SA licensed) a local MediaWiki 1.16.5 installation with the above dump imported access to all the Unixy tools you'd expect What I want:a way to read the wiki offline on my Kindle, presumably as a proper .mobi eBook; EPUB or anything else easily convertable to Mobipocket (i.e. not PDF) is fine too any internal links should work as expected - this will be the main means of navigating the ebook along with search inline images would be fantastic, but a) they aren't critical to this particular wiki, b) they aren't included in the dump and would need to be fetched separately, and c) they would further increase the size of an already hefty file as little MediaWiki-specific content as possible (such as Edit links and page meta info), but this is also a secondary priority ideally, the process should be simple, repeatable on different wikis and require as little human interaction as possible What I tried:straight up converting the XML dump to an HTML file the main problem here is that MediaWiki markup is a pain to parse and none of the libraries I looked at provided satisfactory results: https://github.com/nricciar/wikicloth (Ruby) is pretty fast and seems to provide all the hooks I need, but the git HEAD hasn't been updated in months and is failing several test cases; in particular, list tags are not properly closed which causes all kinds of horrible nesting issues https://github.com/rdblue/marker (Ruby) provides better output, but is super slow and has a less flexible API which means I'd need to do additional post-processing on the HTML to get links and stuff to work the language doesn't matter much, but seems that the situation with Perl and Python libraries isn't much better another issue is with templates, which would require a lot of extra work using one of the available MediaWiki http://www.mediawiki.org/wiki/Alternative_parsers to export the content of my local install in a more convenient format: http://www.mediawiki.org/wiki/Extension:EPubExport chokes and times out when provided with the full list of pages, even with the corresponding time/memory limits raised http://code.pediapress.com/wiki/wiki/Examples seems to only operate on single pages Right now I'm leaning toward using the http://www.mediawiki.org/wiki/Extension:DumpHTML extension (if it's still compatible with the latest MediaWiki - reports vary) to get 5000+ static HTML files and then writing a script that would extract the content subsection of each page, rewriting all headers and internal links to use anchors. Glue the output together and run it through pandoc or similar. Is there a better way?
-
Answer:
Could you email the folks with your favorite potential resource and ask them how updates/fixes are coming along? The first one I checked, wikicloth, appears to have a Google-able author. If someone emailed me asking for an update to something I had made, I'd probably be pleased that they cared enough to ask for it, like this: "hey! this thing that feels obscure to me wasn't useless after all!"
dmit at Ask.Metafilter.Com Visit the source
Related Q & A:
- Is there a way of creating a interactive word document?Best solution by pcworld.com
- How do I make a copy of an existing Yahoo group rather than creating it from scratch?Best solution by Yahoo! Answers
- How do I add Citation Templates for a Wiki?Best solution by Yahoo! Answers
- Where can I find good a tutorial for creating a simple flash movie using Adobe Flash CS4?Best solution by Graphic Design
- How do you ask a question on Wiki answers?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.