How to find the index of value in python list?

How do I find mean and median using hadoop streaming?

  • Ok, I know this is not the right place to put this question but I have asked this question in hadoop mailing list as well as posted it over here: http://stackoverflow.com/questions/15752444/finding-mean-median-using-python-hadoop-streaming?noredirect=1#comment22387897_15752444 I have data like: id,value 1,2.0 1,3.0 1,5.0 2,3.2 and so on I want to find mean and median of each of the id so mean of id : 1 : [2 +3+5]/3 and similarly the median.. I thought, I understood writing hadoop streaming code in python but apparently not (misconceptions/ errors explained in link) but how do i calculate mean and median using python hadoop streaming

  • Answer:

    Mean Have the mapper emit 1 for the key and the pair {1, x} for the value where x is the value input to the mapper. The reducer then iterates through the values passed in, summing each element of the pair separately. The sum of the first element is the count of values, and the sum of the second element is the total. A simple post processing step can then be used to divide the total by the count. You should set the number of reducers to one since there is only one key. To prevent the reducer from being the bottleneck you'll also want to use a combiner with the same function as the reducer. Median Have a stateful mapper that builds a histogram of its inputs and then only emits this once inside cleanup(). Set the key to 1 and the value to the histogram. The reducer then sums up the histograms from each mapper to determine the complete histogram and uses this to determine the median. There is only one key so the number of reducers should be set to one.

Raphael Cendrillon at Quora Visit the source

Was this solution helpful to you?

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.