Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, or CouchDB?
-
I am pretty sure the founders have more experience in scalable database architectures than I do. Coming from Facebook, they also know that MySQL doesn't scale well [1] (or does it if you do some tricks?) What were the considerations they took into account when choosing MySQL as the data store? Are they doing any JOINs over MySQL? Are there plans to switch to another DB? [1] It doesn't scale well out-of-the-box that is.
-
Answer:
If you partition your data at the application level, MySQL scalability isn't an issue. Facebook reported [1] running 1800 MySQL servers with just two DBAs in 2008. You can't do joins across partitions, but the NoSQL databases don't allow this anyway. Facebook hasn't confirmed using Cassandra as the primary source for any data, and it seems like inbox search might be their only use of it. [2] These distributed databases like Cassandra, MongoDB, and CouchDB[3] aren't actually very scalable or stable. Twitter apparently has been trying to move from MySQL to Cassandra for over a year. When someone reports using one of these systems as their primary data store for over 1000 machines for over a year, I'll reconsider my opinion on this. << Update as of August 2011: after I wrote this, foursquare reported an 11-hour downtime because of MongoDB. [4] Separately, a friend's startup that was going through explosive growth tried to switch to MongoDB and gave up after a month due to instability. Twitter gave up on the Cassandra migration. [5] Facebook is moving away from Cassandra. [6] HBase is getting better but is still risky if you don't have people around with a deep understanding of it. [7] >> The primary online data store for an application is the worst place to take a risk with new technology. If you lose your database or there's corruption, it's a disaster that could be impossible to recover from. If you're not the developer of one of these new databases, and you're one of a very small number of companies using them at scale in production, you're at the mercy of the developer to fix bugs and handle scalability issues as they come up. You can actually get pretty far on a single MySQL database and not even have to worry about partitioning at the application level. You can "scale up" to a machine with lots of cores and tons of ram, plus a replica. If you have a layer of memcached servers in front of the databases (which are easy to scale out) then the database basically only has to worry about writes. You can also use S3 or some other distributed hash table to take the largest objects out of rows in the database. There's no need to burden yourself with making a system scale more than 10x further than it needs to, as long as you're confident that you'll be able to scale it as you grow. Many of the problems created by manually partitioning the data over a large number of MySQL machines can be mitigated by creating a layer below the application and above MySQL that automatically distributes data. FriendFeed described a good example implementation of this [8]. Personally, I believe the relational data model is the "right" way to structure most of the data for an application like Quora (and for most user-generated content sites). Schemas allow the data to persist in a typed manner across lots of new versions of the application as it's developed, they serve as documentation, and prevent a lot of bugs. And SQL lets you move the computation to the data as necessary rather than having to fetch a ton of data and post-process it in the application everywhere. I think the "NoSQL" fad will end when someone finally implements a distributed relational database with relaxed semantics. -------- [1] http://www.datacenterknowledge.com/archives/2008/04/23/facebook-now-running-10000-web-servers/ [2] [3] [4] http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/ [5] http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html [6] http://www.facebook.com/note.php?note_id=454991608919 [7] [8] http://bret.appspot.com/entry/how-friendfeed-uses-mysql
Adam D'Angelo at Quora Visit the source
Other answers
Better the devil you know. Especially if its the devil everyone else knows how to beat into submission. We made it work as the database behind a real time game server that supported tens of thousands of simultaneous users almost ten years ago. Even with the tricks we had to do back then such as roll your own replication and aggregation we got it done on hardware that was 100x slower than modern quad+ core ssd based servers.
Kevin Ernest Long
Taso Du Val
Basic answer is companies want to avoid paying big licensing fees to Oracle or Microsoft. If you have enough $$ for your in-house DBA architects and developers, you don't need a shrink wrapped corporate solution.
Paul Lopez
As of 2016 Cassandra has proven that it is one of the most scalable database out there in market. I should also mention that Cassandra is not a direct replacement for MySQL. It is up to your data model to choose which database to run on. I think that Quoraâs question answer data model is not well suited for Cassandra. However it could be used to track user behaviors and stream realtime data about what user wants to read and see.
Salih Gedik
MongoDB tutorial designed to provide hands-on learning on No-SQL and Big Data space. The course includes Introduction to No-SQL, MongoDB Installation, Introduction to J SON/BSON, Requirement of NO-SQL , CRUD Operations, Schema Design and Data Modeling. The online certification database training further enriches your knowledge of MongoDB contents like MongoDB Backup Strategies, Monitoring, Indexing and Aggregation Framework, MongoDB Security, Integration with MongoDB with Jasper-soft, Loading and Managing Unstructured Data (Videos, Images, Logs, Resumes, etc...https://goo.gl/KFZFuc Understand the basics of NoSQL, what is MongoDB and Installation of MongoDB Learn how to implement JSON/BSON Data Types Understand the requirement and scope of NoSQL in present business scenario Understand scalability and availability in MongoDB® using concept of Sharding Perform various CRUD Operations to design Schemas Understand and implement various functions like Stack, merge, Strsplit Gain insights into Data Management using MongoDB® and concepts of Replication Execute different types of Indexing and Aggregation Understand Security Risks to Databases, MongoDB® Security Approach Learn Integration of MongoDB® with Java, JasperSoft and Robomongo Load and Manage Unstructured Data like Videos, images, Logs, Resumes MongoDB® is revered as a next-generation, document-oriented NoSQL database to build high-performance operational database applications. Since all big and small organizations are adopting modern ways of developing and producing database applications, MongoDB® is the perfect solution. This online database training course gives the complete study of one of the most popular NoSQL databases to become a MongoDB Expert. MongoDB® is currently used by thousands of top organizations likeSourceForge, Craigslist,eBay, Viacom, Foursquare and The New York Times, and there is a high demand for skilled experts in the current market. Intellipaatâs certification training will fetch you top-paid jobs and take your career to the next level.
Anonymous
See for a version of this question that takes a different tack.
Alan Morrison
Just my 2 cents - you can scale MySQL, and modern solutions (like ScaleBase) even let you scale it transparently - since they handle the partitioning for you (even joins - better than NoSQL) - you don't have to write a single line of code to support the MySQL scale.
Liran Zelkha
Using MySQL in the case of Quora seems to be a good choice, MySQL scalability is proven lately by few companies to be feasible, MySQL can scale very well, take MySql cluster, or Xeround as an example, they scale very well and provides great performance and throughput. Two interesting links in this context are : 451 groups report - NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases (http://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/) And Xeround cloud database benchmark: http://xeround.com/mysql-cloud-db-overview/xeround-vs-amazon-rds-benchmark/
Avi Kapuya
Whenever I hear something about MySQL not scaling, they are usually comparing it to an entire NoSQL cluster, which is "apples to an orange" comparison really. I personally prefer to use a battle tested storage engine and build partitioning,fault tolerance around it. This is the approach we take for Voldemort at Linkedin. (We use BDB-JE for now. But its the same philosophy)
Vinoth Chandar
Related Q & A:
- Why do we use quicksort instead of heapsort?Best solution by Yahoo! Answers
- How can I use real time social data from Datasift and perform real time analytics on it?Best solution by Quora
- Why doesn't MySQL upload my data properly?Best solution by php-mysql-tutorial.com
- Why do we use Gross Domestic Product instead of Net Domestic Product?Best solution by Yahoo! Answers
- Why would you use symmetric encryption instead of asymmetric one?Best solution by Quora
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.