Turning HTML into a book?
-
I'm trying to turn someone's blog into annual books for them as a gift (not Christmas). What's the best way to do this? The book printer wants a PDF. I've already captured the HTML for the blog into files, and have written a perl script to parse the HTML into a basic structured text file that identifies the title, date, and other blog metadata in a standardized way. My specific question: how can I import this text (with some HTML content in the bodies) into a word processor or other page layout program so that (a) the (simple) HTML formatting inside the blog content is preserved, and (b) styles are automatically assigned to the title, date and such so that I don't have to manually do it? Contraints:I'm flexible about the layout software, but if it's not MS Word 2003 or Publisher, it needs to be free (and able to handle 200-300 page manuscripts in a single file). The structure of the file to import is flexible, since I'm creating it... for example, creating XML would not be difficult.I'm aware there are numerous HTML -> PDF options, but I really want to use a WYSIWYG style layout program that supports TOC creation, page numbering, and so on.The work to automate can't be too elaborate, since this is a one-off (there are about 600 blog entries spread out over three books).Any suggestions?
-
Answer:
You could use one of http://wiki.docbook.org/topic/ConvertOtherFormatsToDocBook to convert your HTML to DocBook, and then one of http://wiki.docbook.org/topic/DocBookPublishingTools to convert your DocBook to PDF.
reborndata at Ask.Metafilter.Com Visit the source
Other answers
In my defense I did download a couple of blogs. One came into Word as the entire 3 years of blogging with each field associated with its particular style. 300+ pages with images, but sadly no comments. Maybe the step I missed to tell you about was that you must save the imported blog as a Word document (not as HTML).
cabb-chase
Word will open HTML files, why not just do that, then save as a .doc and work on it in Word? You'll probably have to be a bit wary of its interpretation of HTML, (you might have tweak your perl script and re-export a couple of times perhaps) but it should work.
AmbroseChapel
latex!! it's perfect - assuming it's just text, and not heavy with images or other stuff. it's not wysiwyg, in that you don't have precise control over the style. it works by 'compiling' tagged text into beautifully typeset documents. table of contents and sections and stuff are trivial. it is built for automation and large documents. I really think it's exactly what you need. many free/open source tools exist. you could probably do the whole thing with one script. see http://www.w3.org/Tools/html2things.html, http://www.iwriteiam.nl/html2tex.html, and other stuff http://www.google.ca/search?q=html+to+latex&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official
PercussivePaul
You seem like a programmer type - if so, you may find http://www.fpdf.org/ to be useful for this purpose. It's a minimalist PHP library for writing PDF's that contains very intuitive handling of margins, line breaks, page breaks, and standard headers & footers (the things I originally assumed would be very complicated when I first experimented with PDF generation). CSS-like text styles would be handled by writing methods that set font characteristics and so on.
migurski
Not exactly what you're looking for, but http://www.blurb.com/create/book/blogbookthat claims to do all this automagically (but is presumably locked into their book-making service).
stavrosthewonderchicken
how about openoffice? i know it can export straight to PDF and it certainly fits your free criteria.
moochoo
IronLizard
You might like this option... get the new Internet Explorer 7, then click on the pull-down menu "Page" and select "Edit with Microsoft Word". Its Magic! Loads the whole thing into Word preserving most of the formating and images. For the blog I tried, it was as good as I could hope for.
cabb-chase
I can't tell if the advice above is meant to be a joke or not. If so, stop. If not, I'm sorry, but please lurk more, cabb-chase. That's fucking cretinous.
stavrosthewonderchicken
Related Q & A:
- Android::How to Create an app for a book?Best solution by Stack Overflow
- How to place a different background image on each part of a book?Best solution by TeX - LaTeX
- How do you make a book for a kindergarten describing how a burrito gets digested?Best solution by answers.yahoo.com
- How can you remove contact without damaging a book?Best solution by Yahoo! Answers
- Why doesn't HTML have a "Slogan" tag?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.