I have a reducer implemented in perl which takes massively longer to run on the cluster than it does locally. Why might this be?
-
Specifically, one reduce task has been running for over 24 hours, while the others completed in under 10 minutes. This one task received data for a key with a relatively large amount of data, about 500K records, and performs a clustering task which should run in ~n^2 time. When I pull the input data associated with this key and the adjacent keys to my local machine, and do a test run like "cat data.txt | perl http://map.pl | sort -k1 | perl http://reduce.pl", the computation is done in less than a minute. I know you can't debug my code without details, but I'm just wondering if anyone has seen anything similar before.
-
Answer:
It is hard to say based on the info you give, but there is a good chance there is a lot less free memory on the hadoop node and your task is swapping like mad. When things suddenly get 100,000 times slower this is usually the cause. You can diagnose this with iostat, or better yet install ganglia and get a view of your whole cluster in graphical form.
Jay Kreps at Quora Visit the source
Other answers
Sometimes this is a symptom of faulty infrastructure: cpu, disk, etc. Do you have other reduce tasks that finished quickly on this same machine? In general, this problem is exactly the type that speculative execution (enabled by default, check mapred.{map,reduce}.tasks.speculative.execution is supposed to solve. If there exists a task that is taking much longer than the rest, then Hadoop should automatically spawn another instance of the task, and occasionally this second instance will finish before the first. Do you have speculative execution turned off? What do the logs for this particular task show? In particular, there are R/W loglines which show read/write progress through the reducer. Do these logs indicate: (a) a stuck reducer, or (b) slow-but-steady progress?
Norbert Burger
There may be multiple reasons for stragglers in a MapReduce cluster. The following paper in OSDI'10 covers some as seen in production clusters. http://research.microsoft.com/en-us/um/people/srikanth/data/mantri_osdi10.pdf Would be helpful to take a cue to see if any one fits yours.
Bikash Sharma
I realized much later that the problem was due to memory issues on the cluster. I didn't realize that the memory allocated to map and reduce tasks applies to the parent Java process that spawns the streaming script, and that my perl script had to make do with what was left over. I had scaled up the task memory allocation thinking that my perl script would share the budget, so perl was furiously writing to swap because it had only a few megabytes of memory to work with. No error was generated, so I didn't notice until larger runs started causing perl to return out of memory errors.
Alex Hasha
Related Q & A:
- How can I convert a string number to a number in Perl?Best solution by Stack Overflow
- Why might a unitary government be more successful in a country like Japan than the US?Best solution by chacha.com
- How can I connect a printer wirelessly to my laptop without a router? Which printer would be the best?Best solution by Yahoo! Answers
- Where do I buy a V for Vendetta mask locally?Best solution by Yahoo! Answers
- Why might assimilation not be achieved in a capitalist society?Best solution by encyclopedia.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.