Best way to parse HTML table
-
I am interested in parsing the following table and others like it: http://www.cityofames.org/ftp/routes/Fall/wdreds&w.html Any suggestions on the best tool for the job? After searching around I can't decide what I should use and would like to get some feedback before committing to something. I am open to any languages/tools.
-
Answer:
If you are looking for an HTML parser, there are number of options in Java: http://jtidy.sourceforge.net/ http://nekohtml.sourceforge.net/ http://jsoup.org/ http://home.ccil.org/~cowan/XML/tagsoup/ You might also want to go through a very comprehensive discussion on pros and cons of using each of these http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers/.
Tarmon at Stack Overflow Visit the source
Other answers
With lynx I can do: $ lynx -dump http://www.cityofames.org/ftp/routes/Fall/wdreds\&w.html 6:25 6:31 6:36 6:41 ----- 6:46 6:50 6:56 7:02 7:08 7:14 7:20 ----- 7:26 7:30 7:36 ----- ----- ----- ----- 7:38 7:43 7:47 7:53 1A 7:28 7:35 7:42 7:48 ----- 7:56 8:00 8:06 ----- ----- ----- ----- 7:58 8:03 8:07 8:13 1A ... becomes very easy to parse with scripting language of choice, html2text may also work(never used it). You could also play around with grep/sed to format it.
fifo
HTML is too difficult to be understood by any parser. You need to first convert this to a reasonably close XML format(for wellformedness- means tags that are matched) like XHTML using a program like tidy(http://tidy.sourceforge.net/). You can then use a XML/XHTML parser to parse the wellformed XML. Note that you will have to process your data based on font styles and convert the tags based on font styles to an array of times. Here is what you can do when parsing start TR element --Create Array start b element -- Add One time end b element start b element -- Add second time end b element end TR element
koya
Related Q & A:
- What is the best way to distribute an audio/video feed from a computer to TVs over existing indoor coax cable?Best solution by Audio-Video Production
- How to display HTML table with angularJS?Best solution by Stack Overflow
- How do I create an HTML table, in jQuery, with JSON data?Best solution by Stack Overflow
- How do you write specific data to an HTML table?Best solution by Stack Overflow
- What is the best way to remember all of the elements on the periodic table?Best solution by ChaCha
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.