How to extract data from any website?

How do I extract XML data from a website?

Answer:

Extracting data from websites with Kettle ranges from easy to difficult. Easy is where you use an HTTP step to call out to the website address, and it brings the response back into the stream as a column (field). The most difficult is where you have to navigate someone's log in logic by using PDI to create shell scripts which use wget or curl to handle the interaction while saving cookies and session information. Then a final call through wget or curl to cause the web site to give you a result page, which could very well be an XML response, but more likely html. The final call will download that html file to disk temporarly, which you can load into memory (single row, single field), use JTIDY via UDJC step to convert the html to XHTML, then run that stream through the Xpath step. So it is tedious, but doable. I am talking with Pentaho about making this process easier until Cloud based SaaS providers figure out that letting the data out through great APIs is something important to customers and something that could be monetized. Reality is, there will always be some great web application, managed by a small company, which may not have the time, money or resources to let data out to machines in an easy way. We have a ton of them in healthcare.

Brandon Jackson at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

How to scrape data from a website?Best solution by Stack Overflow
How can I extract data of an external XML to PHP?Best solution by Stack Overflow
How do I extract music from song?Best solution by Yahoo! Answers
How do I pay my bills for a website?Best solution by Yahoo! Answers
How can I paste and/or copy a link to my website?Best solution by techwalla.com

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.