What is Facebook's architecture?
-
-
Answer:
From various readings and conversations I had, my understanding of Facebook's current architecture is: Web front-end written in PHP. Facebook's HipHop Compiler [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer. Because of the limitations of relying entirely on static compilation, Facebook's started to work on a HipHop Interpreter [2] as well as a HipHop Virtual Machine which translate PHP code to HipHop ByteCode [3]. Business logic is exposed as services using Thrift [4]. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used...) Services implemented in Java don't use any usual enterprise application server but rather use Facebook's custom application server. At first this can look as wheel reinvented but as these services are exposed and consumed only (or mostly) using Thrift, the overhead of Tomcat, or even Jetty, was probably too high with no significant added value for their need. Persistence is done using MySQL, Memcached [5], Hadoop's HBase [6]. Memcached is used as a cache for MySQL as well as a general purpose cache. Offline processing is done using Hadoop and Hive. Data such as logging, clicks and feeds transit using Scribe [7] and are aggregating and stored in HDFS using Scribe-HDFS [8], thus allowing extended analysis using MapReduce BigPipe [9] is their custom technology to accelerate page rendering using a pipelining logic Varnish Cache [10] is used for HTTP proxying. They've prefered it for its high performance and efficiency [11]. The storage of the billions of photos posted by the users is handled by Haystack, an ad-hoc storage solution developed by Facebook which brings low level optimizations and append-only writes [12]. Facebook Messages is using its own architecture which is notably based on infrastructure sharding and dynamic cluster management. Business logic and persistence is encapsulated in so-called 'Cell'. Each Cell handles a part of users ; new Cells can be added as popularity grows [13]. Persistence is achieved using HBase [14]. Facebook Messages' search engine is built with an inverted index stored in HBase [15] Facebook Search Engine's implementation details are unknown as far as I know The typeahead search uses a custom storage and retrieval logic [16] Chat is based on an Epoll server developed in Erlang and accessed using Thrift [17] They've built an automated system that responds to monitoring alerts by launching the appropriated repairing workflow, or escalating to humans if the outage couldn't be overcome [18]. About the resources provisioned for each of these components, some information and numbers are known: Facebook is estimated to own more than 60,000 servers [18]. Their recent datacenter in Prineville, Oregon is based on entirely self-designed hardware [19] that was recently unveiled as Open Compute Project [20]. 300 TB of data is stored in Memcached processes [21] Their Hadoop and Hive cluster is made of 3000 servers with 8 cores, 32 GB RAM, 12 TB disks that is a total of 24k cores, 96 TB RAM and 36 PB disks [22] 100 billion hits per day, 50 billion photos, 3 trillion objects cached, 130 TB of logs per day as of july 2010 [22] [1] HipHop for PHP: http://developers.facebook.com/blog/post/358 [2] Making HPHPi Faster: http://www.facebook.com/note.php?note_id=10150336948348920 [3] The HipHop Virtual Machine: http://www.facebook.com/note.php?note_id=10150415177928920 [4] Thrift: http://thrift.apache.org/ [5] Memcached: http://memcached.org/ [6] HBase: http://hbase.apache.org/ [7] Scribe: https://github.com/facebook/scribe [8] Scribe-HDFS: http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html [9] BigPipe: http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919 [10] Varnish Cache: http://www.varnish-cache.org/ [11] Facebook goes for Varnish: http://www.varnish-software.com/customers/facebook [12] Needle in a haystack: efficient storage of billions of photos: http://www.facebook.com/note.php?note_id=76191543919 [13] Scaling the Messages Application Back End: http://www.facebook.com/note.php?note_id=10150148835363920 [14] The Underlying Technology of Messages: https://www.facebook.com/note.php?note_id=454991608919 [15] The Underlying Technology of Messages Tech Talk: http://www.facebook.com/video/video.php?v=690851516105 [16] Facebook's typeahead search architecture: http://www.facebook.com/video/video.php?v=432864835468 [17] Facebook Chat: http://www.facebook.com/note.php?note_id=14218138919 [18] Who has the most Web Servers?: http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/ [19] Building Efficient Data Centers with the Open Compute Project: http://www.facebook.com/note.php?note_id=10150144039563920 [20] Open Compute Project: http://opencompute.org/ [21] Facebook's architecture presentation at Devoxx 2010: http://www.devoxx.com [22] Scaling Facebook to 500 millions users and beyond: http://www.facebook.com/note.php?note_id=409881258919
Michaël Figuière at Quora Visit the source
Other answers
Facebook uses Linux, Apache, PHP, Memcache, Haystack, and BigPipe. This tech stack seems straight forward, but a lot of optimization is being done under the hood to make it work with the load such a popular site receives, and also the mass amounts of data people upload to and request from. Read the details here: http://en.wikipedia.org/wiki/Facebook http://www.slideshare.net/meet.hak/facebook-technology-stack
Alex Freska
I have written http://blog-bhaskaruni.blogspot.in/2012/12/facebook-architecture.html based on Quora and web-posts Please comment and suggest. Your comments help me, to write more of this kind :)
Ravi Bhaskaruni
In addition to the above: Data Warehousing and Analytics Infrastructure at Facebook: http://borthakur.com/ftp/sigmodwarehouse2010.pdf Apache Hadoop Goes Realtime at Facebook: http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf (For more see 's blog: http://hadoopblog.blogspot.com/ ) Scalable Memory Allocation using jemalloc (by ): http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919 (also ) : https://github.com/facebook/tornado (blog post by : http://bret.appspot.com/entry/tornado-web-server ) http://malteschwarzkopf.de/research/assets/facebook-stack.pdf https://code.facebook.com/posts/290023971344425/what-s-new-in-facebook-open-source/
Alex Kamil
Here's a very interesting presentation by a Facebook engineer about how their architecture has evolved. http://www.infoq.com/presentations/Evolution-of-Code-Design-at-Facebook
Simon Gardner
The backend is PHP but it is precompiled using a system they developed called HipHop The database is a highly customized MySQL database. Here are some details about the system they use: http://www.z-car.com/blog/mysql/what-database-does-facebook-use
Bart Loews
Facebook recently shared on its blog a detailed overview of its next-gen networking architecture, which it piloted at its new Altoona data center. it's quite an innovative approach for coping with their massive traffic volumes, which goes beyond traditional approaches and protocols. fascinating stuff. https://horovits.wordpress.com/2014/12/04/facebook-next-gen-networking-datacenter-fabric/ Another interesting bit is the recently-announced enhanced Facebook Search, which is supported by big data analytics and data management foundations, some of which were discussed a few months ago on a data faculty summit hosted by Facebook, in which Facebook shared its top open data problems. https://horovits.wordpress.com/2014/12/11/facebooks-big-data-analytics-boost-search-capabilities/
Dotan Horovits
At the scale that http://facebook.com/ operates, a lot of traditional approaches to serving web content break down or simply arenât practical. The challenge for Facebookâs engineers has been to keep the site up and running smoothly in spite of handling close to half a billion active users. This article takes a look at some of the software and techniques they use to accomplish that.Facebookâs scaling challengeBefore we get into the details, here are a few factoids to give you an idea of the scaling challenge that Facebook has to deal with: Facebook serves 570 billion page views per month (according to Google Ad Planner). There are more photos on Facebook than all other photo sites combined (including sites like Flickr). More than 3 billion photos are uploaded every month. Facebookâs systems serve 1.2 million photos per second. This doesnât include the images served by Facebookâs CDN. More than 25 billion pieces of content (status updates, comments, etc) are shared every month. Facebook has more than 30,000 servers (and this number is from last year!) Software that helps Facebook scaleIn some ways Facebook is still a LAMP site (kind of), but it has had to change and extend its operation to incorporate a lot of other elements and services, and modify the approach to existing ones.For example: Facebook still uses PHP, but it has built a compiler for it so it can be turned into native code on its web servers, thus boosting performance. Facebook uses Linux, but has optimized it for its own purposes (especially in terms of network throughput). Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and logic onto the web servers since optimizations are easier to perform there (on the âother sideâ of the Memcached layer). Then there are the custom-written systems, like Haystack, a highly scalable object store used to serve Facebookâs immense amount of photos, or Scribe, a logging system that can operate at the scale of Facebook (which is far from trivial).But enough of that. Letâs present (some of) the software that Facebook uses to provide us all with the worldâs largest social network site.MEMCACHEDhttp://memcached.org/ is by now one of the most famous pieces of software on the internet. Itâs a distributed memory caching system which Facebook (and a ton of other sites) use as a caching layer between the web servers and MySQL servers (since database access is relatively slow). Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack).Facebook runs thousands of Memcached servers with tens of terabytes of cached data at any one point in time. It is likely the worldâs largest Memcached installation.HIPHOP FOR PHPPHP, being a scripting language, is relatively slow when compared to code that runs natively on a server. http://wiki.github.com/facebook/hiphop-php/ converts PHP into C++ code which can then be compiled for better performance. This has allowed Facebook to get much more out of its web servers since Facebook relies heavily on PHP to serve content.A small team of engineers (initially just three of them) at Facebook spent 18 months developing HipHop, and it is now live in production.HAYSTACKhttp://www.facebook.com/note.php?note_id=76191543919 is Facebookâs high-performance photo storage/retrieval system (strictly speaking, Haystack is an object store, so it doesnât necessarily have to store photos). It has a ton of work to do; there are more than 20 billion uploaded photos on Facebook, and each one is saved in four different resolutions, resulting in more than 80 billion photos.And itâs not just about being able to handle billions of photos, performance is critical. As we mentioned previously, Facebook serves around 1.2 million photos per second, a number which doesnât include images served by Facebookâs CDN. Thatâs a staggering number.BIGPIPEhttp://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919 is a dynamic web page serving system that Facebook has developed. Facebook uses it to serve each web page in sections (called âpageletsâ) for optimal performance.For example, the chat window is retrieved separately, the news feed is retrieved separately, and so on. These pagelets can be retrieved in parallel, which is where the performance gain comes in, and it also gives users a site that works even if some part of it would be deactivated or broken.CASSANDRAhttp://cassandra.apache.org/ is a distributed storage system with no single point of failure. Itâs one of the poster children for the NoSQL movement and has been made open source (itâs even become an Apache project). Facebook uses it for its Inbox search.Other than Facebook, a number of other services use it, for example Digg. Weâre even considering some uses for it here at Pingdom.SCRIBEhttp://github.com/facebook/scribe is a flexible logging system that Facebook uses for a multitude of purposes internally. Itâs been built to be able to handle logging at the scale of Facebook, and automatically handles new logging categories as they show up (Facebook has hundreds).HADOOP AND HIVEhttp://hadoop.apache.org/ is an open source map-reduce implementation that makes it possible to perform calculations on massive amounts of data. Facebook uses this for data analysis (and as we all know, Facebook has massive amounts of data). http://hadoop.apache.org/hive/ originated from within Facebook, and makes it possible to use SQL queries against Hadoop, making it easier for non-programmers to use.Both Hadoop and Hive are open source (Apache projects) and are used by a number of big services, for example Yahoo and Twitter.THRIFTFacebook uses several different languages for its different services. PHP is used for the front-end, Erlang is used for Chat, Java and C++ are also used in several places (and perhaps other languages as well). http://incubator.apache.org/thrift/ is an internally developed cross-language framework that ties all of these different languages together, making it possible for them to talk to each other. This has made it much easier for Facebook to keep up its cross-language development.Facebook has made Thrift open source and support for even more languages has been added.VARNISHhttp://varnish-cache.org/ is an HTTP accelerator which can act as a load balancer and also cache content which can then be served lightning-fast.Facebook uses Varnish to serve photos and profile pictures, handling billions of requests every day. Like almost everything Facebook uses, Varnish is open source.Hope you got what you wanted !!!
Ashutosh Tripathi
In regards to Michaël Figuière's excellent response, the only thing I can see that is no longer true is their use of Cassandra. It is no longer in use inside of Facebook. This got flagged as needing to be a comment to another answer, but you can't comment on an answer as anonymous. So until Quora changes that policy, this will have to stand alone as an answer.
Anonymous
Facebook Real Time Analytics system is based on Scribe to log all incomming links from like and comments request on a user page. Store them into HDFS than pull them out using Puma and store them on HBase in batches. I wrote a detailed post outlining the Facebook Real Time analytics architecture here: Real Time analytics for Big Data: Facebook's New Realtime Analytics System - http://ht.ly/8OGHD This post include references to vidoe cast and other useful references on that regard.
Nati Shalom
Related Q & A:
- What is design in architecture context?Best solution by Stack Overflow
- What is the best Architecture university in Canada?Best solution by Yahoo! Answers
- Why is Facebook's Pet Society unavailable?Best solution by Yahoo! Answers
- What is Facebook IPO?Best solution by en.wikipedia.org
- What is the relationship between spirituality and Gothic architecture? In what ways does Gothic architecture differ?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.