What are some interesting beginner level projects that can be built using Apache Hadoop?
-
I'm a beginner at Hadoop MapReduce framework and want to know some real world applications which I can build using Apache Hadoop. I'm more of an infrastructure engineer and less of a Machine Learning/AI person, and I'm trying to move beyond the word-count examples. Taking these into account, what are the best ways for me to get started.
-
Answer:
Spring Force Algorith...
Prashant Raaghav at Quora Visit the source
Other answers
Try making a job that efficiently counts word co-occurrence, and occurrence, and then computes term log-likelihood similarity. This is interesting in practical terms because it's the essential basis of many similarity functions. In Hadoop terms it's interesting because to make it fast you will have to get into several non-trivial parts of Hadoop: Multiple outputs Combiners Custom writables Multi-step pipelines Multiple inputs Grouping / comparator classes Tuning mapper output You will also have to think about the accuracy / speed tradeoff: you'll likely have to look at pruning pairs that occur very few times.
Sean Owen
I would like to refer you the following link, which lists projects of all kinds, which can be done using hadoop http://atbrox.com/2011/11/09/mapreduce-hadoop-algorithms-in-academic-papers-5th-update-%E2%80%93-nov-2011/
Ananda Prakash Verma
Find the sum of all Even , Odd Given a sequence of numbers in a hdfs file (one per line) I want to calculate a new sequence with the sums of consecutive even/odd pairs. For example: input sequence: 1,2,3,4 output sequence: Odd 4 Even 6 You could learn more about optimization and how reducer uses key to optimize the output of Mapper Additional Links : http://wildanm.wordpress.com/2009/10/15/project-ideas-for-hadoop/ - Courtesy : http://stackoverflow.com/users/190767/adam
Asif Junaid
Write a Crawler (http://en.wikipedia.org/wiki/Web_crawler) as a Hadoop Map-Reduce which will download and store the records to HBase or a Database. Input: A single seed file or a folder contains n seed files. Algorithm. Partitioner: Send out each seed to single map Mapper: Read each line in the input and go for downloading it using Apache HTTP Client or Mina HTTP Client Reducer: Parse the output of the file using Tika and extract the links, and pushes to the file (So that it can be used as seed for next run. In reducer you can store the extracted meta info like title, body content, so that you can have a very basic Search Engine itself :) That's how i started. At max the above logic requires just 100 lines of code, you will have a distributed crawler in hand on Hadoop. PS. Below additional information is only required for you if you wanted to improve the crawler. Once you are done with it, You can have a lot of features like same host should got to a single mapper so that not all mappers requests the data from a single host. (You can use keep alive flag of http protocol as well), and even one Robot.txt fetch is enough to decide on the URL whether to fetch or skip Use HBase or Cassandra for storing the content, so you query engine can make use of them.
Clement Jebakumar
Read the Twitter Feed with a account and get the Realtime tweets for a particular Tag , say for example -I want to read the complete tweets with a hastag #WorldCup2015 or #58thGrammy -want to count how many of them are media and how many of them are text -Any additional tags involved in the text -think of any additional ways to organize this information -also you can customize based on the location in the above example #WorldCup2015 is just a parameter that needs to be passed from a GUI or some excel spreadsheet , Like wise you can change all the Parameter to your wish and get the information Good luck !
Hemanth Pradeep
Related Q & A:
- What Is An Easy Science Fair Projects For 7th Grade?Best solution by Yahoo! Answers
- What are some good "beginner level" data modeling/analytics approaches to kick start a data science/analytics team?Best solution by Quora
- What are some cool science fair projects?Best solution by Yahoo! Answers
- What are some easy SCIENCE FAIR PROJECTS for EIGHTH grade?Best solution by Yahoo! Answers
- What kind of crops can be planted using organic farming?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.