What are the ways to re-architect a Hadoop MapReduce job which requires loading of large models into RAM to facilitate processing?
-
Im working on a MR job which processes real-time data from various sources. For the processing, each map task requires loading of some GBs of resources into memory. This memory overhead is limiting the number of map and reduce tasks that can be on a machine in parallel as each map task is run in a different JVM instance by Hadoop and each map instance loads its copy of resources. Is there a way where all map tasks can access the resources from a shared memory? If not, would a web service based model (hosting the memory dependant module as a service on a different machine/s) scale for processing 10s of millions of requests per day?
-
Answer:
You would be better off not using MapReduce. The problem statement has all the classic signs of trying to shoe horn an algorithm into MR when other frameworks would be a much better fit. If you wanted to stay within the Hadoop universe, then take a look at YARN as part of the Hadoop 2.x code base. You'll need to write your own Application Master, etc, to do what you are wanting to do. But it seems to me that is a better answer.
Allen Wittenauer at Quora Visit the source
Other answers
It would be good to know what are those resources exactly, but as far as I understand they are probably being used as a big lookup table, is that true? There is always a way to avoid lookups in MapReduce. If you are performing a lookup by some field (e.g. customer_id) in an in-memory table, you can just add your table as another input to the MapReduce and perform a join by customer_id. Doing that, the MapReduce shuffle does all the lookup work for you: it groups together records belonging to a certain customer_id with registers in your lookup table that match the same customer_id, and you receive the data all together in the same reduce group. I hope that sheds some light, otherwise please add more detail to your problem statement.
Pere Ferrera Bertran
Related Q & A:
- What are some ways I can make a little money?Best solution by Yahoo! Answers
- What is the difference between an architect and a drafter?Best solution by Yahoo! Answers
- What are easy ways of writing a good essay?Best solution by Quora
- What are some ways be become a swiss resident?Best solution by Yahoo! Answers
- What is the better way to get a part time job?Best solution by ChaCha
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.