How NoSQL Databases like MongoDB or reThinkDB store and retrieve JSON data from disk?
-
I am planning to learn a new language and I am being provoked to implement a simplest NoSQL Db Engine in it. I have personally used MongoDB and loved it. As people know it's based on JSON(BSON), I am interested to know how the internals work. How data is retrieved from JSON based Objects. What would be the algorithm and how performance and efficiency was taken place?
-
Answer:
So the internals of MongoDB are not that complicated. If you look around, you should be able to find some talks on-line that cover the exact details. Of course, you can always read the source code too :) That said, here's a quick overview. MongoDB uses memory-mapped files. This is an Operating System construct that maps disk locations to spots in memory. Effectively, MongoDB simply pretends that the whole DB is in memory and lets the OS swap files to/from disk. MongoDB uses the "fsync" command to tell the OS when it is ready to flush to disk. This has some efficiency and reliability concerns so their is also a journal file that keeps track of actions performed. When more space is required, MongoDB simply creates a new file (up to 2gb) and maps it into memory. Each Database has a set of files named after if (mydb.1, mydb.2, mydb.3, etc). The different collections within a DB all share the same files. Documents are written to disk using the http://bsonspec.org/ format. To be clear, they are written to memory locations and flushed to disk. To allow for changes in the size of the documents, they are often padded. If you look at the collection statistics you can see the "padding factor". Documents also form a doubly-linked list. They have pointers to "next" & "prev" to allow for walking of a collection. Indexes are written to the file in blocks using B-trees. They have many of the same basic details and limitations as those indexes used in relational databases like MySQL. MongoDB uses a custom wire format. So commands arrive in a very specific format that is not really BSON. The details for this can be found in the driver docs. If you want to get into replication and sharding those are whole other complicated issues. And frankly, MongoDB does those quite poorly. All in all, MongoDB "works", but is not particularly awesome. It has lots of poor architecture decisions that still plague it today and that make it hard to operate large clusters. Some ideas to consider, things you could improve: Memory-mapped files are "easy", but they're not particularly safe: http://ayende.com/blog/162791/on-memory-mapped-files The BSON format is really inefficient. It takes up a lot of space as it stores both keys and values with no compression. Loading up a document requires loading the entire document into memory to access any part of the document. There are lots of possible variants on B-trees that are used to optimize performance, especially for things like "count" operations. Those are things you need to design in advance. There is no compression of data. This sounds like a great spot for improvement. MongoDB uses a single write-lock per DB. You should consider how much concurrency you want to allow.
Gaëtan Voyer-Perrault at Quora Visit the source
Other answers
Cassandra is a distributed database from Apache that is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure.The tutorial starts off with a basic introduction of Cassandra followed by its architecture, installation, and important classes and interfaces. Thereafter, it proceeds to cover how to perform operations such as create, alter, update, and delete on key spaces, tables, and indexes using CQLSH as http://ow.ly/Xdpw300Dyn1 well as Java API. The tutorial also has dedicated chapters to explain the data types and collections available in CQL and how to make use of user-defined data typesâ¦Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of No SQL database. Let us first understand what a No SQL database does.A No SQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data
Shweta Tiwari
Related Q & A:
- How can we use MongoDb with postgresql?Best solution by Stack Overflow
- How to set value in the dropdown from a JSON data list?Best solution by Stack Overflow
- How do I create an HTML table, in jQuery, with JSON data?Best solution by Stack Overflow
- How to read Json Data from online file?Best solution by mkyong.com
- How to deal with dynamic JSON data using AngularJS?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.