Are Organic architecture and High-Tech architecture opposite?

Which underlying architecture should MapReduce take: the converged (shared-nothing) architecture or the segregated (shared-storage) architecture?

  • The original paper of MapReduce advocates for the converged (shared-nothing) architecture. However, I see an anti-pattern in the reality. Amazon offers various services such as the storage cloud service (S3 and ElasticBlock) and the compute cloud service (EC2), and meanwhile it offers the service of MapReduce (Elastic MapReduce). From my personal interpretation, the storage cloud and the compute cloud are separate at least logically (not sure whether the underlying infrastructures are shared or not). As we know that MapReduce framework advocates for a shared-nothing architecture or converged architecture where the compute and storage should be colocated together. One of the key ideas of MapReduce is trying to transfer the user programs but not the data during the distributed parallel execution. However, with the segregated architecture of Amazon EMR (correct me if I am wrong),  the locality of the data is impossible to be exposed to the MapReduce execution runtime, how can Amazon EMR avoid the data transfer across the storage cloud and the compute node and the potential bottleneck on the interconnect? If not, it seems to violate the design philosophy of MapReduce which endeavors to transfer the user programs instead of the data to the largest extent.

  • Answer:

    EMR spins up a temporary Hadoop cluster. It sets up a little HDFS cluster for its own use across the workers. Yes, data must be pulled in from S3 to start, and output at the end back to S3. Internally the HDFS cluster is taking advantage of locality and "share nothing" architecture, but the HDFS cluster is transient. So yes this is not quite optimal -- one must move the data into the computation's temporary world, and then out again -- but it can be just fine in some use cases.

Sean Owen at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.