How to manage MySQL database in Azure?

What are the advantages and disadvantages of using database (like HBase or MySQL) to manage massive metadata in distributed file system when compared with using  traditional metadata cluster (like Ceph)?

  • Answer:

    I don't think there is such a thing as a traditional metadata cluster.  Of the distributed filesystems I know of: Some have had only one metadata server, and have barely evolved beyond that.  Examples: Lustre, HDFS. Some have not had a separate metadata-server role, but have combined that function with data service.  Examples: PVFS, GlusterFS. Only one (that non-specialists might know of) has had multiple dedicated metadata servers - Ceph. To answer your question further, I think we need to separate two different issues - centralized vs. distributed, and database vs. something else.  Multiple metadata servers are necessary to protect against failure, and protecting against failure is not optional in any distributed system.  Multiple active metadata servers (as opposed to active/standby) can also provide scalable performance for many kinds of operations, so they don't become bottlenecks no matter how large your storage cluster gets.  The downside is that certain other types of operations become either much slower or much more complex when distributed across many servers.  In particular, this is true for any kind of global "scan" operation - e.g. to detect files in need of repair or asynchronous replication, to rebalance load, or even to satisfy a user's "find" command. This brings us to databases vs. anything else - usually the contents or attributes stored along with the files in each server's local filesystems.  Scanning millions of records is what databases are designed to do, and they do it very well.  For those types of operations, it would be hard to beat the performance of a single-node database-backed metadata server - so long as all of the metadata fits on one server, and you don't care about failures.  If the metadata has to be spread around, then scanning requires aggregating the results, and that's where distributed databases come in.  Instead of letting local databases do half of the work (local scans on each server) and then having to do the other half (aggregating across servers) in the filesystem, putting all metadata into a distributed database makes the entire thing Somebody Else's Problem.  Now that such databases have developed to the point where some of them can offer both the scalability and consistency that is needed for filesystem metadata, that approach has become viable. The downside of putting metadata into a database, whether it's local or distributed, is that it then requires database-specific tools to examine or modify it.  That's certainly not very welcome for the developers (I can attest to that personally) and my impression is that it's not very welcome for administrators either.  Once you're talking about distributed filesystems, you're talking about lots of data, and people responsible for that much data aren't very fond of having a critical piece hidden in a black box where they can only interact with it in ways the developers had anticipated.  There's also the possibility that the metadata stored within a database might get out of sync with the actual files on the servers.  That shouldn't happen, of course, but bugs happen.  When they do, it's nice to be able to fix things without needing specialized tools.

Jeff Darcy at Quora Visit the source

Was this solution helpful to you?

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.