How to sequence events while uploading large files to Amazon S3?

How does Amazon EMR Hive parallelize when operating on files stored in Amazon S3?

In a typical HDFS setup, the raw files can be partitioned by directory structure and an external table (with partitioning defined) can be created in Hive. This allows the cluster administrator to optimize the way the data is partitioned. In an Amazon EMR and S3 setup, The files are stored in a single S3 bucket. Therefore, how does Amazon EMR Hive know how to best parallelize the job?
Answer:

It appears there is no way for Hive nodes to have locality of reference when using S3. The S3 bucket is accessed as a distributed filesystem and all the nodes access it through its API entry point. EMR accesses S3 through the REST API, just like any app. For example, see the discussion on S3:

Miguel Paraz at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

How do i delete some of my attachments that are stored?Best solution by Yahoo! Answers
How do i send a folder instead of indivual files?Best solution by Yahoo! Answers
How does Amazon shipping work?Best solution by Yahoo! Answers
How does amazon.com work?Best solution by Yahoo! Answers
How do you know what's used and what's new on amazon.ca?Best solution by Yahoo! Answers

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.