What are the projects that can be done as a final year project on Hadoop?
-
I am a final year student and I am an enthusiast learner of Apache Hadoop and MapReduce framework. Please suggest me some projects that can be implemented on this technology. I have 6 months to learn and implement. I also have an idea about developing a small search engine that indexes the pages. Is this feasible to implement?
-
Answer:
I would suggest you to check if you can implement Sentiment Analytics using Twitter Data Feed on Hive/Hadoop. Apache Hive provides some built-in functions and support for the algorithms like N-gram and Context N-grams https://cwiki.apache.org/confluence/display/Hive/StatisticsAndDataMining You can modularize your project in following strems 1. Collect Twitter Data Feed using Twitter's API 2. Pump data coming from Twitter into Hadoop Cluster 3. Design and develope Hive tables using Twitter feed schema 4. Running ngram/context Ngrams on Twitter data The link above also guides/directs in implementing following use cases Use Cases (ngrams) Find important topics in text in conjunction with a stopword list. 2. (ngrams) Find trending topics in text. 3. (context_ngrams) Extract marketing intelligence around certain words (e.g., "Twitter is ___"). 4. (ngrams) Find frequently accessed URL sequences. 5. (context_ngrams) Find frequently accessed URL sequences that start or end at a particular URL. 6. (context_ngrams) Pre-compute common search lookaheads. Hope this is helpful. Also take look at this example of Finding Trending Topics with Stopwords http://bigdatabloggin.blogspot.com/2012/08/trending-topics-in-hive-ngrams.html
Tanmay Deshpande at Quora Visit the source
Other answers
Yes, indexing is a good use case. Aggregating counters and unique users on dimensions is another. Analytics in general.Compression, encoding, transformation, ETL.Graph computation is fastrr in spark.
Nicolae Marasoiu
I work in similar domain, here are few topics where you can use use Hadoop to deal large dataset - Sentiment analysis for twitter, web articles - Identify over all sentiment for web articles, product review, movie review, tweets. Lexical based approach or machine learning techniques can be used Web article classification/summarization - Use clustering/classification technique to classify the web article, perform semantics analysis to summarize the articles Recommendations system based on user's social media profiles - Use social media API, collects user interest from facebook, twitter etc implement recommendation system for user interest Tweet classification and trend detection - Classify the tweets for sports, business, politics, entertainment etc and detect trending tweets in those domain Movie Review Prediction - Use online movie reviews to predict reviews of new movies. Summarize Restaurant Reviews - Take a list of reviews about a restaurant, and generate a single English summary for that restaurant. AutoBot - Build a system that can have a conversation with you. The user types messages, and your system replies based on the user's text. Many approaches here ... you could use a large twitter corpus and do language similarity Twitter based news system - Collect tweets for various categories hourly, daily base, identify trending discussion, perform semantic analysis and create kinda news system (Check Frrole product) Here are few datasets I have compiled -
Pathan Karimkhan
Thanks for A2A I would suggest you to work on a Data Science use case like recommendation system which uses Hadoop/Spark as a computing platform. Prefer not to use inbuilt Machine Learning libraries like Mahout/Mllib
Ankit Sharma
Following are some of the Hadoop projects: -- Real Estate analysis system -- Sentiment Analysis using Hadoop -- Recommendation Engine for Inventory Products -- Product marketing research by doing analysis on present product in market
Kailash Aade
The project on MapReduce (algorithms, search and sorting techniques, graphs) is a very good candidate doing project.
Vamsi Mohan
You can go through the below links which will help you to start working on projects for Hadoop.Find the below links for small use cases on MapReduce in hadoop.The below two links will help you to get hold on Map Reduce concepts:Link 1: https://acadgild.com/blog/mapreduce-use-case-uber-data-analysis/Link 2: https://acadgild.com/blog/mapreduce-use-case-youtube-data-analysis/Link 3: https://acadgild.com/blog/analyzing-titanic-data-with-hadoop-mapreduce/Below links are related to sentiment analysis using Hadoop's various components like Pig and Hive.Link 4: https://acadgild.com/blog/daily-show-data-analysis-using-pig/Link 5: https://acadgild.com/blog/pig-use-case-daily-show-data-analysis-part-ii/Link 6:https://acadgild.com/blog/determining-popular-hashtags-in-twitter-using-pig/Link 7: https://acadgild.com/blog/sentiment-analysis-on-twitter-timezone-wise-analysis/Link 8: https://acadgild.com/blog/counting-hashtags-using-hive/Link 9: https://acadgild.com/blog/sentiment-analysis-on-tweets-using-afinn-dictionary/Link 10: https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/For beginnerâs level use cases in Spark , refer the below links:Link 11: https://acadgild.com/blog/healthcare-use-case-apache-spark/Link 12: https://acadgild.com/blog/introduction-spark-rdd-basic-operations-rdd/Link 13: https://acadgild.com/blog/analyzing-new-york-crime-data-using-sparksql/Link 14: https://acadgild.com/blog/spark-use-case-travel-data-analysis/Link 15: https://acadgild.com/blog/spark-use-case-uber-data-analysis/Link 16: https://acadgild.com/blog/spark-use-case-analyzing-movielens-dataset/Link 17: https://acadgild.com/blog/spark-use-case-social-media-analysis/Visit our website http://www.acadgild.com/ for more real time use cases and projects on Big data technologies like Hadoop,Spark,Machine Learning etc.Satyam Kumar| Hadoop Developer at Acadgild
Satyam Kumar
You have so many options for your project, However in my opinion, if you want to make a difference or prefer tougher topics, you should consider compression/decompression codecs, searching/ sorting algorithms, matching and signature analysis algorithms for HDFS.
Bezan
Related Q & A:
- What kind of job can you get as a 16 year old?Best solution by Yahoo! Answers
- What can be done if a prospective employer cannot verify one of my past jobs in a employment background check?Best solution by Yahoo! Answers
- What can I do as a first-year nurse besides bedside nursing?Best solution by answers.yahoo.com
- Hospital jobs - what can I do as a 20 year old female?Best solution by careercast.com
- What kind of jobs can I get for a 14-year-old?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.