What are the different parts of Hadoop?

What are good ways to optimize Hadoop based on the NFS-like FileSystem?

  • I am writing to ask for     1. Any ideas about Performance optimization inside Hadoop framework based on the NFS-like Shared FileSystem    2. And this mail is also helpful to discuss about whether HDFS should support POSIX or NFS-like interface.     Hadoop MapReduce Framework both depends highly on the distributed file system, so that masters(JobTracker) can split input file, and assign individual parts to separated slaves(TaskTracker). And  Since the files are kept on the shared distributed file  system, if one task or region server failes. the master/jobtracker can immediately assign the task to a different TaskTacker which also have a replication of the data-block of the input file.    Under some circumstance , the local file access is slower than remote NFS file access.  This could  be true currently or in the future.  Suppose we have an fast inter-connection ( say, 10000Mbps Ethernet or Infiniband, and this connection makes networks faster than local SATA harddisk), and we also have an NFS-like shared filesystem which can attached to each TaskTracker Node in the Hadoop Cluster. The NFS-like POSIX-compatible shared FS could be: MapR fileSystem, Oracle Lustre,  IBM GPFS, EMC Isilon, Ceph, Gluster, etc. Which have very advanced cache-algorithms that can ignore the remote harddisk access latency by highly optimized prefetch strategy.   And also these storage system can store a huge number of files,( much more than 100 Million).     I am wondering  what  this means to the current Hadoop framework, and which component of Hadoop MapReduce could be modified to speedup the whole cluster performance, and make Hadoop programming much easier..      The following is some of the component that could be modified: The Following may improve the performance:      1. Partitions of Map Output: currently  Map Output should be partitioned for different reducer, using a index file, and will store in the NFS. Also an http file transformation is needed for reducers to get its map output partition from each mapper. We can directly make map generated output  files for different reducer, and each reducer could access to these output files directly.  And there is also a problem about the merge map-output thread.      2. The Input and output of the Map Reduce Job:  Using NFS-like Storage, we do not needed to copy files into, and out of HDFSs, which will save some time for Job pro/epi-log. The Following could decrease Job Submit latency, which could make Job start quickly.      1. Distributed Cache: since the files could directly be accessed using standard POSIX-interface, so  distributed cache is not necessary. We can just put data into a NFS-mounted dir, so that other nodes can directly open-read/write to it.       2. Setup phase of the Job and Task: The JobJar, JobToken, _partition.lst files of the job, and some other files that should be delivered to each TaskTracker can be stored in the NFS mounted dir, and other TTs could directly access them. And also no distributed cache, and data copy between local and HDFS could make the Hadoop MapReduce programming much easier. Besides all these above, HBase may also benefit from the NFS-like shared file System. So far as I know, plenty of NFS-like FileSystem have support Hadoop MapReduce. and Plenty of them have some optimizations based on the NFS-feature, but unfornately  these are close source.     1. MapR have a NFS-like Distributed File System, and also a MR release ( http://www.mapr.com/Download-document/9-NFS-Technical-Brief )     2. Greenplum also have a team work on optimize Greenplum HD based on the EMC Isilon (http://www.greenplum.com/products/pivotal-hd )     3. IBM have once submit a patch on Hadoop JIRA which could make Hadoop support IBM GPFS, however the patch could not found currently.  (http://www.greenplum.com/products/pivotal-hd ) Does anyone have any details about these implementation? This is very interesting, right?

  • Answer:

    I can provide information about MapR. MapR, as part of the MapR distribution for Hadoop, provides a data platform that internally uses a protobuf-based protocol different from NFS and different from HDFS.  There is a library that provides an HDFS compatible API for the Mapr data platform and there is also a high performance distributed NFS server that allows NFS access (not NFS-like) so that your programs can use conventional I/O to access files in the cluster.  There is also an HBase compatible API that allows very high performance access to tables stored on the MapR data platform. As a measure of the performance of the MapR file system, the recent setting of the minute sort record on Google's Compute Engine using MapR's distribution used the MapR file system for doing the shuffle phase of the computation.  Hadoop used to do something like this, but the performance of HDFS was too poor to continue doing this for long.  MapR FS, in contrast, is able to support shuffle loads and achieve higher performance than the normal shuffle function in Hadoop.  As a gross measure of this performance, during the record-setting minute sort runs, the shuffle data was transferred from mappers to reducers by creating, writing, closing files on one node (the mapper) and then opening, reading, closing and deleting these files from another machine (the reducer).  IN total, there were nearly 4.5 million files that were created, read and deleted in 20 seconds.  These files contained an aggregate volume of 1.5TB. Very few file systems, distributed or otherwise can possibly achieve this level of performance. This has a very direct impact on normal operations in many of the ways that you mention.  You can store shared data on the shared MapR file system via NFS to avoid the limitations of the distributed cache, to distributed executables or shared jars or to run legacy non-Hadoop programs as part of map-reduce jobs. You can find out detailed information about the MapR file system by searching for videos of presentations that Srivas (MapR's CTO) or I have made on the subject.  My slideshare has lots of these presentations as well. To answer your final question, yes.  This is very interesting!

Ted Dunning at Quora Visit the source

Was this solution helpful to you?

Other answers

Under some circumstance , the local file access is slower than remote NFS file access. Local IO should trump remote IO every time, regardless of the interconnect, unless there is something physically wrong with the host, it has really really crappy/old storage, there is something that is eating all of local iops, or maybe some other weirdo edge case.  Consider this: chances are high that the remote storage is *also* reading from disk.  Why are those disks ios+network faster than the local disk ios without the overhead of network? As to the rest of this... It's the classic triangle problem: cheap, fast, good.  Pick two.  Hadoop (both MR and HDFS) is essentially cheap+fast, IMO.  It picks up a lot of its speed by specifically not providing the full world of APIs (thus failing the good) and depending upon other systems to provide them. For example, there are no APIs for locking or random IO in the traditional POSIXy sense. NFS is (typically--implementations are highly variable) good+fast but certainly not cheap when you are talking about building fault tolerant and highly scalable storage (remember: *thousands of clients!*) at the PB scales.  In fact, they are so expensive that a lot of folks will tell you that it "can't it be done".  It can (I've done it in a previous life), but it's just going to cost you more than an arm and a leg... likely several entire family members.

Allen Wittenauer

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.