Is there a way to cap the number of regions for an HBase table?

Question

Erik-Jan van Baaren · Accepted Answer

There is no way to set a fixed number of regions. But you can control the number of regions by changing the maximum size of a region. This can be done by modifying the setting hbase.hregion.max.filesize (at least on version 0.20), which defaults to 256MB if I remember correctly.

Anonymous · Answer

From the hbase-user mailing list: "There is no fixed limit, it has much more to do with the read/write load than the actual dataset size.

HBase is usually fine having very densely packed RegionServers, if much of the data is rarely accessed.  If you have extremely high numbers of regions per server and you are writing to all of these regions, or even reading from all of them, you could have issues." [1]

[1] http://search-hadoop.com/m/cyFfl1SHnbD

Nicolae Marasoiu · Answer

Reads and writes are always sent to the only server serving the region at any given time. HBase does not replicate in-memory - only the storage is replicated, and HDFS is used for that. Without synchronous replication for in-memory database pages and caches, any CP system will direct all operations to a single key to a single server (see CAP theorem) to ensure consistency.

I will first describe failover, which is a bit simpler than balancing: when a region server (RS) is detected as down, a new RS is being picked by a pluggable strategy, by default a random RS. The new server starts and initially works with potentially remote HDFS files - but the HDFS locality will be restored at the first compaction (the new files will be created with a local copy on the RS).

The RS does not have a “copy” except some memory footprint. All the data is in HDFS, replicated there between data nodes. So both horizontally scalable services, HDFS and HBase are typically set up on the same set of hosts, and HBase uses HDFS, but they are completely separate - HBase only has memory storage of its own, the disks belong to HDFS. And yes, the memory allocated to the gone partition is of course freed to be used by others.

Sushant Choudhary · Answer

Nothing.
Its by default behavior of HBase to scale up(auto shard) across region servers and regions are just unit of scalability for the same.
Regions can be compared to row partitions in RDBMS.
Whenever region becomes full, based on region size configured in hbase-site.xml, it gets splitted based on middle half row key.
More the regions , more number of input splits would get created for Mapreduce Job, if we have HBase as source
Usually Number of regions=Number of input splits fpr HBase Mapreduce Jobs

Lars George · Answer

If I were to be in marketing, I would say you can have an unlimited number of column families... but I am not. If I put my technical hat on, I have to advise you to use only as many families as are needed based on your use-case. For reference, please also see my answer to HBase Region Server guidelines give a size range of about 1TB, whereas data nodes are configured 20 times bigger. Why?.

Let's split this into write and read problems as well.

Writes
The issue is that column families further divide the heap space set aside for writing as explained in the answer above. When you have one family an

Let's split this into write and read problems as well.

Writes
The issue is that column families further divide the heap space set aside for writing as explained in the answer above. When you have one family and 32 active regions, you are fine flush size wise. If you had 10 families, for example, then you need to see what that means to your flush files. If all of them were written to at the same rate, you would end up with 10 flush files at the tenth of the size of the configured flush size. You can mitigate this by increasing the flush size by say 10.

If you do not write at the same rate, then you end up with the largest family triggering the flush process - which is synchronized across all stores, aka column families (store is the internal name). This will mean you end up possibly with many small files for those families that are barely written to. Those then need to be merged into the larger, older flush files during compactions, and that will cause compaction storms. These are constant rewrites of many MB or GB just to merge in a few updates. In other words, your write amplification will be costly.

Reads
For reads we have to distinguish regions that are written to as well, or not written to. For the former the above information on writes outweighs the reads. If you have regions though that are not updated but only read from, then the number of column families will mean necessary 64KB (default) block loads for every store that is part of the query.

If you can limit, say, from 10 families to only 1 or 2 during a typical query, then you load 1-2 blocks at least. If you read all from all 10 families, then you read 10 blocks from disk if not cached already. This means more churn on your block cache simply put. It also means more latency to read the entire row.

Summary
The take-away is that you can certainly have many (theoretically Integer.MAX_VALUE) but in practice it depends on your use-case: 
 * Read/Write with Same Rate Fill
Increase the flush size accordingly.

* Read/Write with Skewed Fill
Probably will cause compaction storms: avoid!

* Read Many/Write Few Regions
Make sure you understand latency implications with block loads

If your use-case is highly selective across families, you should be fine for the last case.

Hope that helps,
Lars

Lars George · Answer

Theoretically no, you can simply set the number of versions to Integer.MAX_VALUE and store as many as you like.

The thing to watch out for is that this makes the rows potentially very wide and HBase cannot split a row. In other words with millions of versions in one row you end up with a few rows - or even just one row - per region. That can be OK, but defeats the purpose of for example splits, compactions and block caches.

You need to be aware of the total size of each row or you may face weird issues down the road.

Todd Lipcon · Answer

There's not really a limit. Here are some things to consider:

Lock granularity

When you do an operation within a row, the RegionServer code briefly holds a lock on that row while applying the mutation.
On the plus side, this means that you can act atomically on several columns - concurrent readers will either see the entire update or won't see the update at all. They shouldn't (barring one or two bugs we're still stomping on) see a partial update.
On the minus side, this means that the throughput of write operations within a single row is limited (probably a few hundred per second).
We're currently working on some optimizations for specific cases like increment so that multiple incrementers can "line up" behind the lock and then batch their addition together into a single transaction.
Region distribution

The unit of load balancing and distribution is the region, and a row will never be split across regions. So, no matter how hot a row is, it will always be served by a single server. If the data were split across many rows, you could force a split in between two hot rows to distribute load between two hosts.
Bugs

In prior versions of HBase there were some bugs where we would accidentally load or deserialize an entire row into RAM. So if your row is very large (100s of MBs) you may have run into serious performance issues, OOMEs, etc. I think most of these bugs are since squashed, and the RS does a smart job of only loading the necessary columns, but it's something to be aware of.
Summary
In summary, if you don't need to do atomic operations across multiple cells, probably better to make a "tall" data layout.

Mita Baxi · Answer

As per CAP theorem,

C - Consistency means a client should get same view of data at a given point in time irrespective of node it is looked up from.

A - Availability here means that any given request should receive a response [success/failure].

P - Partition Tolerance means the system remains operational despite node or other hardware failures, the system is tolerant enough to these kind of failures.

In distributed systems, consistency, availability and partition tolerance exists in a mutually dependent relationship. You cannot have all 3, for example, if you choose strict consistency you have to give away availability, therefore pick any two.

HBase chooses Consistency and Partition Tolerance and compromises on Availability part. Below image shows how different databases satisfy the CAP theorem.

Image Courtesy : Datastax

Sandeep Kale · Answer

There are two ways to set TTL in an HBase table.

The first way is to set it in the mapreduce job itself, using your mapper class or the mapreduce configuration.

The second way is to set it via Accumulo, using the Accumulo shell.

Sarnath K · Answer

Always keep name of CF and the columns very small because they are replicated in every cell in physical storage… What lousy design!! Can any1 educate on the rationale? Keeping them small will ensure lesser disk footprint for your tables…

There is a memstore for each CF. So having less columns in a CF is good as long as you are not separating logical groups of columns which are usually read or updated together.

Compaction is also done at CF level. Not sure what best practices you can follow here.

Asaf Mesika · Answer

Unless you a region gets created every second, it would not. Every client caches the meta region so number of hits to it is small.

Nicolae Marasoiu · Answer

Regions automatically split at a certain size. But until enough splits are created, servers are under utilized and imbalanced, some getting most traffic and others none, with the few hit regions called hot spots.

By having the regions boundaries predefined, one gets more regions to start with, better distributed across the cluster, with no effort to split while taking traffic, having better balancing in the cluster, and having the regions key structure and splits done evenly or unevenly, with perhaps some knowledge of guess about the access patterns and the more used keys, thus attempting a better usage of the cluster and better resilience and performance of the services on top in terms of peak traffic sustaining for longer time, lower latency percentiles and higher throughput of lower cost/resources.

Vladimir Rodionov · Answer

There is no limit on number of column families in HBase, in theory. In reality, there are several factors which can limit useable number of column families in HBase:

1. HBase Admin web UI usability. It will be very hard to show even 100s of column families in a table configuration page.
2. HDFS practical limit of maximum number of files. Say, 100m. If your table has N regions, M column families you will need NxM directories to support this configuration. Every region/column family, in turn, can contain up to K store files (depends on write load and many other configuration options). With very modest N = 100 and K = 10 we can say practical limit of maximum number of column families is less than 100K. Usually, much less than 100K.

Each column family has its own directory in HDFS and set of store files and, from performance point of view, the fewer directories (column families) you have the better performance for scan operations you get.

Ankit Gupta · Answer

The MemStore stores updates in memory as sorted KeyValues, the same as it would be stored in an HFile. There is one MemStore per column family. The updates are sorted per column family.

When the MemStore accumulates enough data, the entire sorted set is written to a new HFile in HDFS. HBase uses multiple HFiles per column family, which contain the actual cells, or KeyValue instances. These files are created over time as KeyValue edits sorted in the MemStores are flushed as files to disk.

Note that this is one reason why there is a limit to the number of column families in HBase. There is one MemStore per CF; when one is full, they all flush. It also saves the last written sequence number so the system knows what was persisted so far.

The highest sequence number is stored as a meta field in each HFile, to reflect where persisting has ended and where to continue. On region startup, the sequence number is read, and the highest is used as the sequence number for new edits.

Anonymous · Answer

Unless the columns are storing very large data themselves, a few hundred columns in a single column family are unlikely to have performance issues.

See http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/3672 for a discussion on how limits were being improved around the 0.19 / 0.20 release timeframe.

Sushant Choudhary · Answer

Hi,

There is no hard limit to number of columns in HBase , we can have more than 1 million columns but usually three column families are recommended ( not more than three).

If you are using something like Elastic search or Solr for indexing data, it doesn't make much difference to me.

Depending on your data access patterns, you should consider wide table vs tall table layout.

Please do not try to have join like operations between different tables . HBase will be not giving good performance for such type of queries.

Amandeep Khurana · Answer

There is no straight forward way to accomplish cross table or even cross row transactions in HBase. HBase gives up transactions and to gain scalability. As the system exists today, you'll have to build transaction logic in your application side. Having said that, you probably want to re-evaluate your requirement for transactions and think about doing it all in a single row. Some applications can be redesigned to leverage what HBase has to offer. Others might need sophistication on the client side or might not be a good fit for HBase at all.

Vinisha Shah · Answer

Check out Apache Tephra Apache Tephra - Home [ http://tephra.incubator.apache.org/ ] and Apache Phoenix Transactions (beta) [ https://phoenix.apache.org/transactions.html ] . These are open source projects for transactions over HBase

Alex Kozlov · Answer

HBase is already partitioned on the PK by default, so you can incorporate your partitioning into HBase schema design. I think the “HBase: The Definitive Guide [ http://shop.oreilly.com/product/0636920014348.do ]” by Lars George [ https://www.quora.com/profile/Lars-George ] still remains the classic book, even though there a lot of other presentations out there how to do it properly. I have also seen suggestions of creating one big HBase table where each individual table is a partition, but this is probably going too far…