How can I learn hadoop through projects?

What are the projects that can be done as a final year project on Hadoop?

I am a final year student and I am an enthusiast learner of Apache Hadoop and MapReduce framework. Please suggest me some projects that can be implemented on this technology. I have 6 months to learn and implement. I also have an idea about developing a small search engine that indexes the pages. Is this feasible to implement?
Answer:

I would suggest you to check if you can implement Sentiment Analytics using Twitter Data Feed on Hive/Hadoop. Apache Hive provides some built-in functions and support for the algorithms like N-gram and Context N-grams https://cwiki.apache.org/confluence/display/Hive/StatisticsAndDataMining You can modularize your project in following strems 1. Collect Twitter Data Feed using Twitter's API 2. Pump data coming from Twitter into Hadoop Cluster 3. Design and develope Hive tables using Twitter feed schema 4. Running ngram/context Ngrams on Twitter data The link above also guides/directs in implementing following use cases Use Cases (ngrams) Find important topics in text in conjunction with a stopword list. 2. (ngrams) Find trending topics in text. 3. (context_ngrams) Extract marketing intelligence around certain words (e.g., "Twitter is ___"). 4. (ngrams) Find frequently accessed URL sequences. 5. (context_ngrams) Find frequently accessed URL sequences that start or end at a particular URL. 6. (context_ngrams) Pre-compute common search lookaheads. Hope this is helpful. Also take look at this example of Finding Trending Topics with Stopwords http://bigdatabloggin.blogspot.com/2012/08/trending-topics-in-hive-ngrams.html

Tanmay Deshpande at Quora Visit the source

Was this solution helpful to you?

Other answers

Yes, indexing is a good use case. Aggregating counters and unique users on dimensions is another. Analytics in general.Compression, encoding, transformation, ETL.Graph computation is fastrr in spark.

Nicolae Marasoiu

I work in similar domain, here are few topics where you can use use Hadoop to deal large dataset - Sentiment analysis for twitter, web articles - Identify over all sentiment for web articles, product review, movie review, tweets. Lexical based approach or machine learning techniques can be used Web article classification/summarization - Use clustering/classification technique to classify the web article, perform semantics analysis to summarize the articles Recommendations system based on user's social media profiles - Use social media API, collects user interest from facebook, twitter etc implement recommendation system for user interest Tweet classification and trend detection - Classify the tweets for sports, business, politics, entertainment etc and detect trending tweets in those domain Movie Review Prediction - Use online movie reviews to predict reviews of new movies. Summarize Restaurant Reviews - Take a list of reviews about a restaurant, and generate a single English summary for that restaurant. AutoBot - Build a system that can have a conversation with you. The user types messages, and your system replies based on the user's text. Many approaches here ... you could use a large twitter corpus and do language similarity Twitter based news system - Collect tweets for various categories hourly, daily base, identify trending discussion, perform semantic analysis and create kinda news system (Check Frrole product) Here are few datasets I have compiled -

Pathan Karimkhan

Thanks for A2A I would suggest you to work on a Data Science use case like recommendation system which uses Hadoop/Spark as a computing platform. Prefer not to use inbuilt Machine Learning libraries like Mahout/Mllib

Ankit Sharma

Following are some of the Hadoop projects: -- Real Estate analysis system -- Sentiment Analysis using Hadoop -- Recommendation Engine for Inventory Products -- Product marketing research by doing analysis on present product in market

Kailash Aade

The project on MapReduce (algorithms, search and sorting techniques, graphs) is a very good candidate doing project.

Vamsi Mohan

Satyam Kumar

You have so many options for your project, However in my opinion, if you want to make a difference or prefer tougher topics, you should consider compression/decompression codecs, searching/ sorting algorithms, matching and signature analysis algorithms for HDFS.

Bezan

Related Q & A:

What kind of job can you get as a 16 year old?Best solution by Yahoo! Answers
What can be done if a prospective employer cannot verify one of my past jobs in a employment background check?Best solution by Yahoo! Answers
What can I do as a first-year nurse besides bedside nursing?Best solution by answers.yahoo.com
Hospital jobs - what can I do as a 20 year old female?Best solution by careercast.com
What kind of jobs can I get for a 14-year-old?Best solution by Yahoo! Answers

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.