How can I learn hadoop through projects?

What are the projects that can be done as a final year project on Hadoop?

  • I am a final year student and I am an enthusiast learner of Apache Hadoop and MapReduce framework. Please suggest me some projects that can be implemented on this technology. I have 6 months to learn and implement. I also have an idea about developing a small search engine that indexes the pages. Is this feasible to implement?

  • Answer:

    I would suggest you to check if you can implement Sentiment Analytics using Twitter Data Feed on Hive/Hadoop. Apache Hive provides some built-in functions and support for the algorithms like N-gram and Context N-grams https://cwiki.apache.org/confluence/display/Hive/StatisticsAndDataMining You can modularize your project in following strems 1. Collect Twitter Data Feed using Twitter's API 2. Pump data coming from Twitter into Hadoop Cluster 3. Design and develope Hive tables using Twitter feed schema 4. Running ngram/context Ngrams on Twitter data The link above also guides/directs in implementing following use cases Use Cases (ngrams) Find important topics in text in conjunction with a stopword list. 2. (ngrams) Find trending topics in text. 3. (context_ngrams) Extract marketing intelligence around certain words (e.g., "Twitter is ___"). 4. (ngrams) Find frequently accessed URL sequences. 5. (context_ngrams) Find frequently accessed URL sequences that start or end at a particular URL. 6. (context_ngrams) Pre-compute common search lookaheads. Hope this is helpful. Also take look at this example of Finding Trending Topics with Stopwords http://bigdatabloggin.blogspot.com/2012/08/trending-topics-in-hive-ngrams.html

Tanmay Deshpande at Quora Visit the source

Was this solution helpful to you?

Other answers

Yes, indexing is a good use case. Aggregating counters and unique users on dimensions is another. Analytics in general.Compression, encoding, transformation, ETL.Graph computation is fastrr in spark.

Nicolae Marasoiu

I work in similar domain, here are few topics where you can use use Hadoop to deal large dataset - Sentiment analysis for twitter, web articles - Identify over all sentiment for web articles, product review, movie review, tweets. Lexical based approach or machine learning techniques can be used Web article classification/summarization - Use clustering/classification technique to classify the web article, perform semantics analysis to summarize the articles Recommendations system based on user's social media profiles - Use social media API, collects user interest from facebook, twitter etc implement recommendation system for user interest Tweet classification and trend detection - Classify the tweets for sports, business, politics, entertainment etc and detect trending tweets in those domain Movie Review Prediction - Use online movie reviews to predict reviews of new movies. Summarize Restaurant Reviews - Take a list of reviews about a restaurant, and generate a single English summary for that restaurant. AutoBot - Build a system that can have a conversation with you. The user types messages, and your system replies based on the user's text. Many approaches here ... you could use a large twitter corpus and do language similarity Twitter based news system - Collect tweets for various categories hourly, daily base, identify trending discussion, perform semantic analysis and create kinda news system (Check Frrole product) Here are few datasets I have compiled -

Pathan Karimkhan

Thanks for A2A I would suggest you to work on a Data Science use case like recommendation system which uses Hadoop/Spark as a computing platform. Prefer not to use inbuilt Machine Learning libraries like Mahout/Mllib

Ankit Sharma

Following are some of the Hadoop projects: -- Real Estate analysis system -- Sentiment Analysis using Hadoop -- Recommendation Engine for Inventory Products -- Product marketing research by doing analysis on present product in market

Kailash Aade

The project on MapReduce (algorithms, search and sorting techniques, graphs) is a very good candidate doing project.

Vamsi Mohan

You can go through the below links which will help you to start working on projects for Hadoop.Find the below links for small use cases on MapReduce in hadoop.The below two links will help you to get hold on Map Reduce concepts:Link 1: https://acadgild.com/blog/mapreduce-use-case-uber-data-analysis/Link 2: https://acadgild.com/blog/mapreduce-use-case-youtube-data-analysis/Link 3: https://acadgild.com/blog/analyzing-titanic-data-with-hadoop-mapreduce/Below links are related to sentiment analysis using Hadoop's various components like Pig and Hive.Link 4: https://acadgild.com/blog/daily-show-data-analysis-using-pig/Link 5: https://acadgild.com/blog/pig-use-case-daily-show-data-analysis-part-ii/Link 6:https://acadgild.com/blog/determining-popular-hashtags-in-twitter-using-pig/Link 7: https://acadgild.com/blog/sentiment-analysis-on-twitter-timezone-wise-analysis/Link 8: https://acadgild.com/blog/counting-hashtags-using-hive/Link 9: https://acadgild.com/blog/sentiment-analysis-on-tweets-using-afinn-dictionary/Link 10: https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/For beginner̢۪s level use cases in Spark , refer the below links:Link 11: https://acadgild.com/blog/healthcare-use-case-apache-spark/Link 12: https://acadgild.com/blog/introduction-spark-rdd-basic-operations-rdd/Link 13: https://acadgild.com/blog/analyzing-new-york-crime-data-using-sparksql/Link 14: https://acadgild.com/blog/spark-use-case-travel-data-analysis/Link 15: https://acadgild.com/blog/spark-use-case-uber-data-analysis/Link 16: https://acadgild.com/blog/spark-use-case-analyzing-movielens-dataset/Link 17: https://acadgild.com/blog/spark-use-case-social-media-analysis/Visit our website http://www.acadgild.com/ for more real time use cases and projects on Big data technologies like Hadoop,Spark,Machine Learning etc.Satyam Kumar| Hadoop Developer at Acadgild

Satyam Kumar

You have so many options for your project, However in my opinion, if you want to make a difference or prefer tougher topics, you should consider compression/decompression codecs, searching/ sorting algorithms, matching and signature analysis algorithms for HDFS.

Bezan

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.