Did Facebook develop a custom in-memory database to manage its News Feeds?
-
An article from Mike Schroepfer discussing Facebook's architecture mentions a custom in-memory database that handles queries in real time without the need for hitting the database. Did Facebook develop an in-memory database to avoid hitting the disk-based MySQL for writes or was Mike just talking about a database built leveraging the cache in Memcached? Could a 3-tier architecture work where writes are handled by an in-memory DB like MySQL Cluster, reads are handled by a Memcached layer, and MySQL database is used as persistent metadata storage and backup? In this type of hypothetical architecture, each request for a write would go to MySQL Cluster, then the data would be asynchronously replicated to both Memcached and MySQL database. Each subsequent read request would be sent to Memcached first, then to MySQL Cluster (if the cache expired), and as the last resource it would go to MySQL database. Below is the excerpt from Mike Schroepfer's article and the link to the full article: "Your best weapon in most computer science problems is caching. But if, like the Facebook home page, it's basically updating every minute or less than a minute, then pretty much every time I load it, it's a new page, or at least has new content. That kind of throws the whole caching idea out the window. Doing things in or near real time puts a lot of pressure on the system because the live-ness or freshness of the data requires you to query more in real time. We've built a couple systems behind that. One of them is a custom in-memory database that keeps track of what's happening in your friends network and is able to return the core set of results very quickly, much more quickly than having to go and touch a database, for example. And then we have a lot of novel system architecture around how to shard and split out all of this data. There's too much data updated too fast to stick it in a big central database. That doesn't work. So we have to separate it out, split it out, to thousands of databases, and then be able to query those databases at high speed." http://www.technologyreview.com/web/23508/page1/
-
Answer:
Yes Most of this stuff is not secret. We presented the basics of the newsfeed backend at f8 a couple of years ago, and at a few other tech-talks. It's nothing complicated. Just an in-memory store of recent actions taken by each user on the site. These actions can be queried and merged into stories which can then be ranked and returned to the user. The challenges were mainly in dealing with the huge load, as well as robustly handling various failure and timeout scenarios. The basic architecture is similar to an in-memory search cluster. There are a bunch of leaf nodes that store user actions in RAM (on the heap), sharded by actor. The leaves support a variety of filtering primitives, so you can just have them return actions of a certain type, or with particular attributes. Then there are aggregation nodes that query the leaves, merge and rank the results, and return them to the client.
David Braginsky at Quora Visit the source
Other answers
Was interested in 1 of your specific questions: "Could a 3-tier architecture work where writes are handled by an in-memory DB like MySQL Cluster, reads are handled by a Memcached layer, and MySQL database is used as persistent metadata storage and backup?" The preview release of MySQL Cluster includes a native memcached interface, so you could actually consolidate all tiers down to 1 with MySQL Cluster - reads, writes and persistence There is a good blog as follows on how to configure this and where to get the code: http://www.clusterdb.com/mysql-cluster/scalabale-persistent-ha-nosql-memcache-storage-using-mysql-cluster/
Mat Keep
Facebook, like other hyperscalers, deploys its infrastructure at the rack level and software is designed to make the fullest use possible of the compute, storage, and networking shared across the rack to run specific workloads. At the Open Compute Summit this week in San Jose, Facebook representatives and manufacturers of Open Compute gear were showing off some new hardware toys, which we will cover separately. Facebook, interestingly, gave a peek inside the current configurations of its racks, which are used to support web, database, and various storage workloads. The Type VI rack is used for heavy cache applications like the Facebook News Feed, ad serving, and search. There are no Knox disk arrays in this setup, but each of the 30 Leopard servers has 256 GB of main memory (a high configuration) and a 2 TB disk drive (a midsized one in Facebookâs categorization). We donât know this, but the rack is a high compute/high memory setup designed to accelerate fast access to data, and it probably has a high core count per CPU. You might wonder that the News Feed is backed by disk instead of flash, but since the size of the News Feed data quickly outgrew the size of the flash, they ultimately had to move on to disk. Obviously, with flash drives now pushing 10 TB, size is not the issue, but cost still is !!!
Ashutosh Tripathi
Related Q & A:
- How to develop a mobile payment system?Best solution by itproportal.com
- How to develop a web application?Best solution by Stack Overflow
- How to develop a photographic memory?Best solution by Yahoo! Answers
- How can I develop a customer base for selling cars?Best solution by briantracy.com
- How to develop a formula for cos3x?Best solution by answers.yahoo.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.