Is there a clean way to parse HTML?

Best way to parse HTML table

Tarmon at Stack Overflow Visit the source

Was this solution helpful to you?

Other answers

With lynx I can do: $ lynx -dump http://www.cityofames.org/ftp/routes/Fall/wdreds\&w.html 6:25 6:31 6:36 6:41 ----- 6:46 6:50 6:56 7:02 7:08 7:14 7:20 ----- 7:26 7:30 7:36 ----- ----- ----- ----- 7:38 7:43 7:47 7:53 1A 7:28 7:35 7:42 7:48 ----- 7:56 8:00 8:06 ----- ----- ----- ----- 7:58 8:03 8:07 8:13 1A ... becomes very easy to parse with scripting language of choice, html2text may also work(never used it). You could also play around with grep/sed to format it.

fifo

HTML is too difficult to be understood by any parser. You need to first convert this to a reasonably close XML format(for wellformedness- means tags that are matched) like XHTML using a program like tidy(http://tidy.sourceforge.net/). You can then use a XML/XHTML parser to parse the wellformed XML. Note that you will have to process your data based on font styles and convert the tags based on font styles to an array of times. Here is what you can do when parsing start TR element --Create Array start b element -- Add One time end b element start b element -- Add second time end b element end TR element

koya

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.