What is the best way to download wikipedia articles, parse them, and store in my database?
-
So I downloaded from one of the dumps, and I am not sure what I would need to decompress them. Do I need a hadoop cluster for that or i only need the storage big enough for all the files. Then I want to be able to convert them into DOM and store in a MySQL database. Also how am I supposed to update those articles stored in the database when the new version of dumps comes out.
-
Answer:
Get the wikipedia dumps here @http://en.wikipedia.org/wiki/Wikipedia:Database_download
Anonymous at Quora Visit the source
Other answers
Here is a post from my personal blog http://www.justreadout.com/2015/01/read-wikipedia-articles-offline-by.html Pay a visit for more such tips and tricks http://www.justreadout.com
Kaushal Mehra
Related Q & A:
- What is the best way to distribute an audio/video feed from a computer to TVs over existing indoor coax cable?Best solution by Audio-Video Production
- What is the best way to clean LEGO bricks?Best solution by bricks.stackexchange.com
- What is the best way to make UI for an Isometric game in Java?Best solution by Game Development
- What is the best way to calculate a date difference?Best solution by Stack Overflow
- What is the best way to count lines in file?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.