What's the best way to convert a Microsoft Word document to clean unstyled HTML?
-
Goals and constraints: I'd like to take a Word doc, maintain only basic styling, and get very clean HTML or, ideally, Markdown out. It also has to be able to be automated via an API, script, Mac program, etc (i.e. no Windows programs or pastable HTML forms). Like clean enough such that my grandmother could view the source and recognize it as her own delicious pie recipe. Here is what I'd like to maintain (anything else is gravy): Chapter headings (or things with large font treated the same way) (h1, h2, h3) Paragraphs (p) Lists (ol, ul) Links (a) Bold (b) Tables (if possible) I don't want any CSS... just HTML tags. It's ok if the user has some minimal work to do: I'm ok with a solution that requires some tagging by the user (e.g. chapter headings), but I would much prefer a tagging solution that could find other instances (e.g. if I mark one section as header 1, it looks for others with the same font attributes) and that wouldn't require the user to tag until their fingers bleed (think #Chapter1 as good, but <p>My laborious task.</p><p>My laborious task.</p>as bad.) What I've already found: I've seen web tools like: http://word2cleanhtml.com/ I've played around with Pandoc: http://johnmacfarlane.net/pandoc/ I've even used Google Docs (it does a pretty good job, and you can API-a-tize it - but I'm not yet convinced it's the golden path.) I've heard about, but not yet tried: http://wordoff.org/api http://www.w3.org/People/Raggett/tidy/ / http://tidy.sourceforge.net/ Dreamweaver Export (wild!) So, I ask you now, friends, Romans, countrymen of the Internet, how would you do this? Heck, how would you even approach this problem - maybe I'm doing it all wrong!
-
Answer:
Do a self email in Gmail, attaching the word file. Then open it using "view as html" Save or share. Same can be done with .pdf files too.
Anonymous at Quora Visit the source
Other answers
I'm a Roman*, so I'll tell you what I use**. LibreOffice, which reads all those Microsoft files and saves into a lot of different formats. It costs you nothing to try. You can even edit PDF files. http://libreoffice.org * Rome, NY. ** By necessity, because I run Linux
James Van Damme
hi you can save in Word as filtered HTML. Have you tried that? It removes all Office-specific tags etc... there are some font and style defs left in the HTML, but it is pretty clean...
Brian Phillips
Unfortunately Wordoff has stopped working. Use http://www.html-cleaner.com instead. It's very user-friendly way of converting Word documents to clean HTML!
John Johnson
Aptara and Innodata are the partners Inkling used to do this to make hundreds of super clean HTML books.Even with 50+ amazing engineers, the most efficient way to do this with the highest quality was with people + technology (ticketing systems on the content, revision control, and individual logins made process improvements possible.)There were too many variables and things that went wrong to be able to efficiently do it automatically (at least in 2012) with the quality you're talking about. Heck, even having the original InDesign files that the print book was made from didn't help much.The differences between clean HTML and great page layout are just so different.Try printing out a random complex webpage, the mess that's produced will give you a sense of just how different HTML and print are when you dig a bit.
Jordan Crawford
Related Q & A:
- What's the best way to start a small clothing line business?Best solution by Yahoo! Answers
- What's the best way to get a job in a restaurant?Best solution by Yahoo! Answers
- What's the best way to make a good impression at a job interview?Best solution by Yahoo! Answers
- What's the best way to connect a home theater system?Best solution by Yahoo! Answers
- What's the best way to become a lawyer?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.