Which underlying architecture should MapReduce take: the converged (shared-nothing) architecture or the segregated (shared-storage) architecture?
-
The original paper of MapReduce advocates for the converged (shared-nothing) architecture. However, I see an anti-pattern in the reality. Amazon offers various services such as the storage cloud service (S3 and ElasticBlock) and the compute cloud service (EC2), and meanwhile it offers the service of MapReduce (Elastic MapReduce). From my personal interpretation, the storage cloud and the compute cloud are separate at least logically (not sure whether the underlying infrastructures are shared or not). As we know that MapReduce framework advocates for a shared-nothing architecture or converged architecture where the compute and storage should be colocated together. One of the key ideas of MapReduce is trying to transfer the user programs but not the data during the distributed parallel execution. However, with the segregated architecture of Amazon EMR (correct me if I am wrong), the locality of the data is impossible to be exposed to the MapReduce execution runtime, how can Amazon EMR avoid the data transfer across the storage cloud and the compute node and the potential bottleneck on the interconnect? If not, it seems to violate the design philosophy of MapReduce which endeavors to transfer the user programs instead of the data to the largest extent.
-
Answer:
EMR spins up a temporary Hadoop cluster. It sets up a little HDFS cluster for its own use across the workers. Yes, data must be pulled in from S3 to start, and output at the end back to S3. Internally the HDFS cluster is taking advantage of locality and "share nothing" architecture, but the HDFS cluster is transient. So yes this is not quite optimal -- one must move the data into the computation's temporary world, and then out again -- but it can be just fine in some use cases.
Sean Owen at Quora Visit the source
Related Q & A:
- Which classes do I take to become a dermatologist?Best solution by Yahoo! Answers
- Which is better: civil engg or architecture?Best solution by quora.com
- Which is the best NIT for architecture?Best solution by Yahoo! Answers
- What is the relationship between spirituality and Gothic architecture? In what ways does Gothic architecture differ?Best solution by Yahoo! Answers
- Which city has nicer architecture?Best solution by ChaCha
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.