How do I debug my map-reduce program while using IDE?
-
- I wrote my Map and Reduce in Intellij IDEA - my hadoop is running in pesudo-distributed mode - I want to run the program for a small set of data through IDE to make sure everything works - Is it that I have to submit the job through command-line and watch hadoop logs? - any better ideas on it?
-
Answer:
I strongly recommend using something like MRUnit[1] (which supplies the necessary test infrastructure to mock out Hadoop) to test map and reduce classes rather than attempting to test jobs in the context of the framework itself. Remember you want to unit test functionality and correctness and using integration testing in a small (but real) cluster to prove the glue between your code and that of the infrastructure. Both mappers and reducers should be self contained (i.e. have no externally visible side effects) which makes testing trivial (in theory). In other words, you should be able to pass a mapper a key / value pair and get the intermediate key / value pair you expect. The same is true for your reducer. You don't need the rest of Hadoop to prove to you that works. Draw a clear line between what your code does and behave according to the contract of the system and integration testing should really just be about throwing real world data at the code. When you find a record that breaks your code, copy it into your MRUnit test input dataset, masking sensitive fields accordingly, and now you've started building future regression validation against known problems. If you're test input dataset in unit tests is over ~100 records, you're probably over thinking the problem. Once unit testing is complete, build a set of larger, real world datasets that run on a test cluster. Use something like to drive tests and validate results regularly (e.g. hourly or when a new build completes). [1] http://mrunit.apache.org/
Eric Sammer at Quora Visit the source
Other answers
As I can understand your problem is to validate your map reduce code on dummy cluster. Following are few options : 1. As Eric asnwered you can write MRUnit tests. This is a very simple way to test mapper alone, reducer alone and mapper-reducer in combination. It can be executed in IDE but it doesn't launch any distributed environment for your Job. 2. Use counters. Only issue counters is that in case a job fails and re runs your counters can be reporting wrong numbers. 3. Use MiniCluster(http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/hdfs/MiniDFSCluster.html) . This is one of the most complete way to test and validate your code. It launches a dummy cluster with 2 datanodes and distributed file system and can be executed from your IDE. It can't be debugged as new processes are launched for mappers and reducers. Only problem with it is that it takes a bit long to run the test as compared to MRUnit. 4. In case you want to debug your code try running code with LocalJobRunner.
Neeraj Chaplot
Related Q & A:
- How can I debug my php code?Best solution by Stack Overflow
- How do I make a python web program that is on a ubuntu server allow access to the server?Best solution by Yahoo! Answers
- How do I debug with Pyglet?Best solution by Game Development
- How can I make the map draggable?Best solution by Stack Overflow
- How can I change my "default" email program to Yahoo?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.