What is the best key-value database?

Did Facebook develop a custom in-memory database to manage its News Feeds?

  • An article from Mike Schroepfer discussing Facebook's architecture mentions a custom in-memory database that handles queries in real time without the need for hitting the database. Did Facebook develop an in-memory database to avoid hitting the disk-based MySQL for writes or was Mike just talking about a database built leveraging the cache in Memcached? Could a 3-tier architecture work where writes are handled by an in-memory DB like MySQL Cluster, reads are handled by a Memcached layer, and MySQL database is used as persistent metadata storage and backup? In this type of hypothetical architecture, each request for a write would go to MySQL Cluster, then the data would be asynchronously replicated to both Memcached and MySQL database. Each subsequent read request would be sent to Memcached first, then to MySQL Cluster (if the cache expired), and as the last resource it would go to MySQL database. Below is the excerpt from Mike Schroepfer's article and the link to the full article: "Your best weapon in most computer science problems is caching. But  if, like the Facebook home page, it's basically updating every minute or  less than a minute, then pretty much every time I load it, it's a new  page, or at least has new content. That kind of throws the whole caching  idea out the window. Doing things in or near real time puts a lot of  pressure on the system because the live-ness or freshness of the data  requires you to query more in real time. We've built a couple systems behind that. One of them is a custom  in-memory database that keeps track of what's happening in your friends  network and is able to return the core set of results very quickly, much  more quickly than having to go and touch a database, for example. And  then we have a lot of novel system architecture around how to shard and  split out all of this data. There's too much data updated too fast to  stick it in a big central database. That doesn't work. So we have to  separate it out, split it out, to thousands of databases, and then be  able to query those databases at high speed." http://www.technologyreview.com/web/23508/page1/  

  • Answer:

    Yes Most of this stuff is not secret. We presented the basics of the  newsfeed backend at f8 a couple of years ago, and at a few other  tech-talks.  It's nothing complicated. Just an in-memory store of recent actions  taken by each user on the site. These actions can be queried and  merged into stories which can then be ranked and returned to the user.  The challenges were mainly in dealing with the huge load, as well as  robustly handling various failure and timeout scenarios. The basic architecture is similar to an in-memory search cluster. There are a bunch of leaf nodes that store user actions in RAM (on the heap), sharded by actor. The leaves support a variety of filtering primitives, so you can just have them return actions of a certain type, or with particular attributes. Then there are aggregation nodes that query the leaves, merge and rank the results, and return them to the client.

David Braginsky at Quora Visit the source

Was this solution helpful to you?

Other answers

Was interested in 1 of your specific questions: "Could a 3-tier architecture work where writes are handled by an in-memory DB like MySQL Cluster, reads are handled by a Memcached layer, and MySQL database is used as persistent metadata storage and backup?" The preview release of MySQL Cluster includes a native memcached interface, so you could actually consolidate all tiers down to 1 with MySQL Cluster - reads, writes and persistence There is a good blog as follows on how to configure this and where to get the code: http://www.clusterdb.com/mysql-cluster/scalabale-persistent-ha-nosql-memcache-storage-using-mysql-cluster/

Mat Keep

Facebook, like other hyperscalers, deploys its infrastructure at the rack level and software is designed to make the fullest use possible of the compute, storage, and networking shared across the rack to run specific workloads. At the Open Compute Summit this week in San Jose, Facebook representatives and manufacturers of Open Compute gear were showing off some new hardware toys, which we will cover separately. Facebook, interestingly, gave a peek inside the current configurations of its racks, which are used to support web, database, and various storage workloads. The Type VI rack is used for heavy cache applications like the Facebook News Feed, ad serving, and search. There are no Knox disk arrays in this setup, but each of the 30 Leopard servers has 256 GB of main memory (a high configuration) and a 2 TB disk drive (a midsized one in Facebook’s categorization). We don’t know this, but the rack is a high compute/high memory setup designed to accelerate fast access to data, and it probably has a high core count per CPU. You might wonder that the News Feed is backed by disk instead of flash, but since the size of the News Feed data quickly outgrew the size of the flash, they ultimately had to move on to disk. Obviously, with flash drives now pushing 10 TB, size is not the issue, but cost still is !!!

Ashutosh Tripathi

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.