How can I learn hadoop through projects?

What are some good open source projects involving the use of Hadoop/Hbase for somebody who wants to learn/use them in a real-world scenario ?

  • I have experience on C++/Python/Java/Perl . I understand the basic concepts of MapReduce  and distributed computing but want a practical way to understand the need and use for such systems in dealing with large amounts of data. I think the best way to learn is to get involved in a project that already leverages the above frameworks.

  • Answer:

    is a project using Hadoop as infrastructure for machine learning. You can start with the Quickstart page: https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart and work your way through the examples.

Yuval Feinstein at Quora Visit the source

Was this solution helpful to you?

Other answers

Check Crux Reporting for HBase: http://github.com/sonalgoyal/crux

Sonal Goyal

You may also want to take a look at disco (http://discoproject.org/). It's a good way to begin writing distributed code and learn about the concepts and fundamentals of distributed computing. The disco documentation is also pretty comprehensive.

Nikhil Singh

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.